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CHAPTER  I 


INTRODUCTION 


The  primary  purpose  of  this  study  is  to  compare  Kendall's  method 
with  linear  discriminant  analysis  in  cases  when  both  the  assumptions  of 
equal  variance-covariance  matrices  and  multivariate  normality  are  valid 
and  in  cases  when  either  or  both  of  these  assumptions  are  invalid.  The 
basis  of  comparison  will  be  the  probabilities  of  misclassification. 

Consider  two  populations,  II  and  II  ,  and  suppose  that  samples  of 

!  ? 

size  n^  and  n^,  respectively,  are  available  from  each  population.  Let 
fi(x)  denote  the  density  function  of  the  random  vector  in  IK.  It  is 
frequently  assumed  that  f^(x)  and  f ^ (x)  are  multivariate  normal  with 
means  and  ,  respectively,  and  a  common  variance-covariance  matrix 
I.  The  linear  function  which  minimizes  the  probability  of  misclassifi¬ 
cation  is 

(u  -  u  )'I_1x  -  -  u.  VE"1  (w,  +  Vi,),  (1) 

the  linear  discriminant  function,  a  form  of  which  was  first  introduced 
by  Fisher  in  1936  (1). 

When  the  parameters  are  estimated,  the  sample  discriminant 
function  is  obtained: 

D-(x)  =  (xL  -  x.2)'S  Jx  -  y(x?  -  x^'S  kx2  +  x^)  (2) 
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UA  r. 


n>M  fliUiggiAaAl 
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'where 


and 


n . 
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2  n. 
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_ a —  y  y  (x. 

■i.  +  n.,-2  L,  /L  -i 


.  -  >: .  ) ( x -  -  x.)' 

■la  — i  — id  — i 


=  i  o=l 


(3) 


(4) 


I’ucli  work  ha:-  beer,  done  in  determining  the  probabilities  of  misclassifi- 
cation  in  linear  di seriminant  analysis,  particularly  with  respect  to 
the  sample  di seriminant  function  ( 2,3 ,4 , 5 ,6 ,7 ) . 

i'.ethodr  of  nonpar anetric  discriminant  analysis  are  of  interest 
because,  as  noted  above,  the  assumptions  of  multivariate  normality  and 
equal  variance-covariance  matrices  necessary  in  linear  discriminant 
analysis  frequently  are  unacceptable.  !!.  3.  Kendall  (8,  9)  has 
suggested  a  method  of  ncnparanetric  discriminant  analysis,  sometimes 
referring  tc  it  as  the  "order-statistic"  method.  In  this  method  the 
variates  are  examined  one  at  a  time.  Consider,  for  example,  the  ith 
variate.  Referring  to  Figure  1,  this  method  may  be  explained.  The 
variate  values  from  !"  (  arc  indicated  by  x's  and  the  values  from  Ii0  by 
v's.  Belov;  .4  there  are  four  values  from  3,  and  none  from  31  Above 
.85  there  are  three  observations  from  and  none  from  H  .  There  are 
thus  seven  values  outside  the  region  of  overlap.  The  lower  and  upper 
cutoff  points  are  .4  and  .85,  respectively.  All  of  the  var:ates  are 
examined  in  the  same  way  and  the  variate  having  the  largest  number  of 
values  outside  the  region  of  overlap  is  selected  as  the  f'rst 
discrimination  variate  with  the  cutoff  points  as  the  discrimination 
cutoff  points.  All  observations  with  values  for  that  variate  below  the 


f 


.4 


Figure  1.  Illustrative  Example  of  Kendall's  Method 

lower  cutoff  point  and  above  the  upper  cutoff  point  are  removed  from 
further  consideration;  they  have  been  classified.  The  procedure  is 
continued  with  the  remaining  observations  and  the  remaining  variates, 
’••’hen  the  procedure  is  finished  a  set  of  classification  rules  will  have 
been  obtained.  In  this  case  Rule  1  would  be  as  follows: 


Rule  1 


x.  <  .4  assign  to  IT 
i  1 

x.  >  .85  assign  to  Jl? 
.4  <  x.  <  .85  see  Rule  1 

i  - 


Early  in  the  study  the  statistical  literature  was  searched  for 
examples  of  multivariate  data  which  could  be  used  to  test  the 
feasibility  of  Kendall's  method.  A  total  of  seven  examples  was  found 


and  Kendall's  method  was  used  with  each  of  these  data  sets. 
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In  order  to  investigate  the  effect  of  unequal  variance-covariance 
matrices,  the  following  variance-covariance  matrices  were  considered: 

Zj  =  (1  -  Pj)I  +  PjEpp  (5) 

^2  =  o2C(l  ~  P2)I  +  P2Epp]  (6) 

(Epp  is  a  pxp  matrix  of  l's)  1  >  >  -  (p  -  1)  ^ 

These  were  chosen  because  they  are  not  uncommon  in  biological  and 
psychological  work  and  may  be  good  approximations  in  many  other 
situations.  Variance- covariance  matrices  of  the  form  (5)  and  (6)  have 
been  considered  in  a  number  of  studies  concerned  with  discriminant 
analysis.  In  1945  Beall  (10)  introduced  an  approximate  method  for 
calculating  discriminant  functions,  assuming  equality  of  covariances 
and  variances,  citing  the  earlier  empirical  evidence  of  Jackson  (11) 
that  this  was  not  unreasonable.  Later  (1946-47),  Penrose  (12)  developed 
the  concept  of  size  and  shape  components  for  the  case  Ej  =  E2  and 
pj  =  p  .  In  1963  Bartlett  and  Please  (13)  considered  the  general  case 
of  Ej  and  given  by  (5)  and  (6)  above  with  zero  mean  differences 
between  the  populations  and  applied  the  method  to  some  measurements  on 
twins.  A  Bayesian  analysis  of  the  same  problem  was  given  later  by 
Geiser  and  Desu  (14).  Han  (1968)  (15)  derived  the  discriminant  function 
in  the  case  of  unequal  mean  vectors  and  later  (3969)  (16)  studied  the 
distribution  of  the  discriminant  function  when  Pj  =  o^. 

Sampling  experiments  were  performed  using  the  variance-covariance 
matrices  E^  and  E^.  Two  p-variate  normal  populations,  IT ^  and  II 2 ,  were 

considered,  with  means  u,  and  u  and  variance-covariance  matrices  E. 

—1—2  1 


aaasaaaa^ 
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and  Z2>  respectively,  in  and  n^.  In  the  experiments,  ,  p^,  o2  and 
were  varied;  was  always  set  equal  to  the  zero  vector.  The  value 
of  p  used  in  all  experiments  was  S. 

Initial  samples  of  t ize  20  (sometimes  50  or  100)  from  populations 
ni  and  Il2  were  generated  and  Kendall's  discrimination  rules  were  derived 
from  the  initial  sample.  These  rules  were  then  applied  to  samples  of 
size  500  each  from  flj  and  fl2  .  The  entire  procedure  was  repeated  50 
times,  each  time  with  nev  samples.  The  same  procedure,  using  the  same 
set  of  random  numbers,  was  used  with  the  linear  discriminant  function. 
The  average  probabilities  of  misclassif ication  provided  ertimates  of  the 
expected  value  of  the  probability  of  misclassification  when  these 
discrimination  procedures  would  be  applied. 

The  results  of  the  sampling  experiments  also  provided  the 
necessary  data  to  compare  the  empirical  probabilities  of  misclassifi¬ 
cation  fcr  linear  discriminant  analysis  with  the  theoretical  values 
obtained  by  Gilbert  in  the  case  I2  =  dZj  ,  that  is,  when  one  variance- 
covariance  matrix  is  a  multiple  of  the  other. 

Some  sampling  experiments  have  been  done  with  multivariate  non¬ 
normal  distributions  (all  variables  independent).  The  particular 
distributions  considered,  the  Cauchy  and  the  uniform,  have  been  selected 
because  of  the  difficulty  of  distinguishing  between  these  distributions 
and  the  normal  on  the  basis  of  a  small  sample.  The  lognormal  distri¬ 
bution  was  considered  also,  serving  as  an  example  of  an  asymmetric 
distribution. 

Another  main  purpose  of  this  study  is  to  develop  a  modified 
Bartlett  and  Please  method.  These  authors  have  obtained  a  linear  dis¬ 
criminant  function  in  the  case  of  zero-mean  differences  when  the 


variance-covariance  matrices  are  of  the  form  already  considered. 


=  (1  -  Pj)I  +  P}Epp 
=  02[(1  -  p  )I  +  p  Epp] 

However,  as  A.  Kshirsagar  has  noted,  Bartlett  and  Please  have  not 
correctly  obtained  the  cutoff  point  for  the  function  which  provides 
equal  probabilities  of  misclassification.  A  procedure  is  developed  in 
this  thesis  which  does  provide  this  cutoff  point.  The  procedure  is 
applied  to  the  data  considered  by  Bartlett  and  Please  and  the  results 
compared.  The  modified  Bartlett  and  Please  method  is  compared  also 
with  Kendall's  method  in  the  case  of  zero  mean  differences. 

Finally,  a  number  of  other  nonparametric  discrimination 


procedures  are  examined  and  compared  with  Kendall's  method. 


CHAPTER  II 


KENDALL'S  NONPARAMETRIC  DISCRIMINANT  ANALYSIS  METHOD 

2.1  Description  of  Method 

The  simplest  way  to  explain  the  order-statistic  method  is  by 
considering  an  example.  Kendall  (8,  9)  used  the  Iris  data  of  Fisher. 

We  will  consider  a  multivariate  example  from  geology.  This  example  is 
given  by  Krumbein  and  Graybill  (17)  and  is  based  on  the  work  of  Link  (18). 
This  example  concerns  discrimination  between  two  carbonate  subenviron¬ 
ments:  clear,  shallow  water,  and  abundant  algae  water  on  the  basis  of 
two  physicochemical  variables  and  two  measures  of  sedimentary  texture. 

The  data  is  given  in  Table  1.  Here  is  the  Eh  below  the  interface, 

V2  is  the  pH  below  the  interface,  V3  is  the  phi  mean  diameter,  and  is 
the  phi  standard  deviation. 

Consider  now  Table  2,  in  which  the  data  for  group  1  (clear, 
shallow  water)  and  group  2  (abundant  algae  water)  have  been  combined  and 
the  measurements  of  each  variable  separately  have  been  ordered  from  low 
to  high.  Consider  V^.  In  the  range  .97  to  1.67  there  is  overlap  in  the 
V4  measurements  of  the  two  groups.  However,  below  .97  all  of  the 
measurements  are  associated  with  group  1.  Above  1.67  all  of  the 
measurements  are  associated  with  group  2.  There  are  thus  a  total  of  13 
V4  values  outside  the  region  of  overlap  -  four  associated  with  group  1 
and  nine  associated  with  group  2.  Examination  of  the  data  in  Table  2 
reveals  that  there  are  fewer  observations  lying  outside  the  common 
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TABLE  2 

DATA  OF  TABLE  1  ORDERED  FROM  LOW  TO  HIGH  VALUES 
OF  EACH  PARAMETER  SEPARATELY 


V] 

L 

V2 

V3 

V4 

Group  1 

Group  2 

Group 

1  Group  2 

Group  2 

Group  1  Group  2 

-383 

4.28 

.13 

.10 

-264 

4.30 

.78 

.56 

-261 

4.34 

.82 

.79 

-235 

4.44 

.88 

.94 

-225 

4.60 

1.22 

.97 

-224 

4.74 

1.37 

1.01 

-214 

4.80 

1.52 

1.08 

-214 

4.86 

1.68 

1.13 

-214 

4.89 

1.70 

1.13 

-213 

5.19 

1.72 

1.20 

-200 

5.42 

1.90 

1.21 

-193 

5.42 

1.91 

1.23 

-174 

5.53 

1.93 

1.30 

-170 

5.54 

1.93 

1.33 

-158 

5.65 

2.01 

1.41 

-158 

5.86 

2.12 

1.51 

-157 

5.86 

2.14 

1.55 

-107 

5.86 

2.17 

1.57 

1.60 

-  79 

6.10 

2.31 

1.64 

-  76 

6.29 

2.31 

1.67 

-  45 

6.56 

2.38 

1.78 

-  36 

6.86 

2.41 

2.22 

0 

6.92 

2.51 

2.43 

34 

7.08 

2.59 

2.72 

43 

7.22 

2.85 

2.79 

48 

7.56 

2.90 

2.84 

74 

7.92 

3.14 

2.86 

83 

7.97 

3.16 

2.91 

104 

8.36 

3.52 

3.20 

110 

8.93 

5.30 

10 


range  of  these  variables  than  for  .  This  variable  is  then  used  for 
the  first  discrimination  rule 


1. 

V4< 

.97 

assign  to  group 

1 

(4) 

V4> 

1.67 

assign  to  group 

2 

(9) 

.97  <  V.  < 

1.67 

see  Rule  2 

(17) 

(The  number  in  parenthesis  is  the  number  of  observations  for  which  the 
prior  statement  applies;  e.g. ,  there  are  four  observations  with 
\  <  .97.) 

The  13  cases  discriminated  by  Rule  1  are  then  removed  from  further 
consideration.  The  data  remaining  is  given  in  Table  3.  V3  is  now  the 
most  discriminating  variable,  so  Rule  2  becomes 


2. 

.97  <  V 

< 

1.67 

4 

V3 

< 

1.22 

assign  to  group  1 

(4) 

V3 

> 

2.17 

assign  to  group  1 

(1) 

1.22  <  V, 

< 

2.17 

see  Rule  3 

(12) 

The  remaining  data  is  given  in  Table  4.  For  Rule  3, 


3. 


.97  <  V4  <  1.67 

1 . 22  <  V3  <  2.17 

V2  <  4.86 

V2  >  7.22 

4.86  <  V?  <  7.22 


assign  to  group  2 
assign  to  group  2 
see  Rule  4 


(2) 

(3) 

(7) 


il  JJ  II  Ml* 
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TABLE  3 


TABLE  2 

V1 

DAT*  REMAINING  AFTER  DISCRIMINATING 

V2 

WITH  V4 

V3 

Group 

-L  Group  2 

Gro’  (j  i 

Group  2 

Group  1 

Group 

-383 

4.34 

.13 

-264 

4.60 

.78 

-261 

4.86 

CM 

00 

• 

-224 

5.19 

.88 

-200 

5.42 

1.22 

-193 

5.53 

1.52 

-158 

5.65 

1.68 

-158 

5.86 

1.70 

-107 

5.86 

1.72 

-  76 

6.29 

1.90 

-  36 

7.08 

1.91 

0 

7.22 

1.93 

34 

7.56 

1.93 

43 

7.92 

2.01 

48 

7.97 

2.12 

74 

8.36 

2.17 

104 


8.93 


2.38 


12 


TABLE  4 

TABLE  3  DATA  REMAINING  AFTER  DISCRIMINATING  WITH  V  AND  V 

*T  O 


Group  1  Group  2 

-383 

-264 

-224 

-200 

-193 

-158 

-107 

-  76 

-  36 
43 

48 

74 


Group  1  Group  2 

4.34 

4.60 

4.86 
5.19 

5.42 

5.86 

5.86 
6.29 
7.22 

7.92 
7.97 

8.93 


The  remaining  sample  data  is  given  below: 


Group  1  Group  2 
-383 

-264 

-200 

-158 

-107 

43 

74 
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So,  finally.  Rule  4  is 

4.  .97  <  V4  <  1.67 

1,22  f  V3  -  2'17 
4. 86  <  V2  <  7.22 

V  <  -264  assign  to  group  2  (1) 

V  >  -264  assign  to  group  1  (6) 

Residual  group :  0 

Thus  it  is  seen  that  all  of  the  30  samples  have  been  assigned 
correctly.  Krumbein  and  Graybill  in  using  linear  discriminant  analysis 
have  found  that  7  of  the  30  samples  were  misclassified.  There  is,  of 
course,  concern  here  with  the  sampling  variation.  The  set  of  rules 
derived  from  this  particular  example  may  perform  poorly  when  applied  to 
a  new  sample.  This  problem  is  examined  in  detail  later. 

2.2  Application  to  Some  Additional  Examples  in  the  Statistical 

Literature 

The  statistical  literature  was  examined  for  further  examples  of 
multivariate  data  which  could  be  analyzed  by  Kendall's  method. 

Cochran  (19)  had  a  convenient  list  of  12  numerical  applications  of 
linear  discriminant  analysis  reported  in  the  literature.  Few  of  these 
papers  were  used,  however,  either  because  the  data  was  not  in  a  con¬ 
venient  form  or  else  the  required  individual  observation  data  was  not 
listed.  A  total  of  seven  examples,  including  the  Fisher  Iris  data, 
were  found  finally;  these  are  described  in  Table  5.  One  of  the 
examples,  that  of  Krumbein  and  Graybill*  has  already  been  considered  in 
Section  2.1.  In  Table  6  there  is  a  comparison  of  the  results  of 


14 


TABLE  5 

VARIABLE  AND  POPULATION  DESCRIPTIONS  FOR  DATA  SETS 
USED  WITH  KENDALL'S  METHOD 


AUTHOR 

VARIABLES 

POPULATIONS 

1.  Fisher 

a. 

b. 

c. 

Sepal  and  petal 

length  and  width 

of  Iris 

2  species  of  Iris 
Versicolor  and  Virginica 
Setosa  and  Virginica 

Setosa  and  Versicolor 

2.  Beall 

4  psychological 

tests 

Men  and  women 

3.  Tintner 

Length,  amplitude, 

rate  of  change,  etc., 
in  price  cycle 

Consumers'  and  producers' 

goods 

4.  Dempster 

Renal  blood  pressure 

as  a  function  of 

time 

Control  group  and  treated 

group  of  laboratory 

animals 

5 .  Krumbein  6 

Graybill 

Electrochemical 

measurements  of  water 

sample;  grain  size 

and  sorting  measure¬ 
ments 

Water  samples  from  two 

carbonate  environments 

6.  Mosteller  & 

Tukey 

Word  frequency 

occurrence 

Papers  by  Hamilton  and 

Madison 

7 .  Beerstecher 

et  al 

Metabolic 

measurements 

Alcoholic  and  nonalcoholic 

individuals 

TABLE  6 

COMPARISON  OF  RESULTS  OF  USTNG  KENDALL'S  METHOD  AND  LINEAR  DISCRIMINANT  ANALYSIS 

ON  DATA  SETS  DESCRIBED  IN  TABLE  5 
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applying  Kendall's  order-statistic  method  and  linear  discriminant 
analysis  to  each  sample  set.  Each  of  these  examples,  except  the  Iris 
data,  are  discussed  later.  The  notation  N(i,i)  used  in  Table  6  is  the 
number  of  observations  from  the  ith  population  which  were  correctly 
assigned  to  the  ith  population.  The  last  column  of  Table  6  gives  the 
number  of  variates  used  in  classification  by  Kendall's  method. 

These  examples  have  been  considered  in  order  to  examine  the 
feasibility  of  applying  Kendall's  method  in  a  wide  variety  of  different 
situations . 

Beall  (10)  -  Four  psychological  tests  were  given  to  32  men  and 
32  women  [Table  7].  It  is  desired  to  find  which  test  results  differ¬ 
entiate  between  men  and  women.  Kendall's  method  results  in  the 
following  set  of  rules: 


GO 

CN 

A  1 

CO 

> 

assign  to  men 

(18) 

00 

V  1 

CO 

> 

assign  to  women 

(3) 

8  <  V3  < 

28 

see  Rule  2 

(43) 

8  <  v3  < 

28 

V!  >  20 

assign  to  men 

(1) 

V!  <  7 

assign  to  women 

(2) 

7  <  V  < 

20 

see  Rule  3 

(40) 

n i  fra w mrrvwi  »n.ir 
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TABLE  7 

THE  SCORES  OF  32  MEN  AND  32  WOMEN 
ON  FOUR  PSYCHOLOGICAL  TESTS. 

DATA  FROM  BEALL  (10). 

Men  _ Women 


_1 

_2 

J5 

_4 

_2 

_3 

_4 

15 

17 

24 

14 

13 

14  , 

12 

21 

17 

15 

32 

26 

14 

1 

14 

26 

15 

14 

29 

23 

12 

19 

21 

21 

13 

12 

10 

16 

12 

13 

10 

16 

20 

17 

26 

28 

11 

20 

16 

16 

15 

21 

26 

21 

12 

9 

14 

18 

15 

13 

26 

22 

10 

13 

18 

24 

13 

5 

22 

22 

10 

8 

13 

23 

14 

7 

30 

17 

12 

20 

19 

23 

17 

15 

30 

27 

11 

10 

11 

27 

17 

17 

26 

20 

12 

18 

25 

25 

17 

20 

28 

24 

14 

18 

13 

26 

15 

15 

29 

24 

14 

10 

25 

28 

18 

19 

32 

28 

13 

16 

8 

14 

18 

18 

31 

27 

14 

8 

13 

25 

15 

14 

26 

21 

13 

16 

23 

28 

18 

17 

33 

26 

16 

21 

26 

26 

10 

14 

19 

17 

14 

17 

14 

14 

18 

21 

30 

29 

16 

16 

15 

23 

18 

21 

34 

26 

13 

16 

23 

24 

13 

17 

30 

24 

2 

6 

16 

21 

16 

16 

16 

15 

14 

16 

22 

26 

11 

15 

25 

23 

17 

17 

22 

28 

16 

13 

26 

16 

16 

13 

16 

14 

16 

13 

23 

21 

15 

14 

20 

26 

18 

18 

34 

24 

12 

10 

12 

9 

16 

15 

28 

27 

14 

17 

24 

23 

15 

16 

29 

24 

13 

15 

18 

20 

18 

19 

32 

23 

11 

16 

18 

28 

18 

16 

33 

23 

7 

7 

19 

18 

17 

20 

21 

21 

12 

15 

7 

28 

19 

19 

30 

28 

6 

5 

6 

13 

&£&&:£ f: 
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3. 

8  <  v3  < 

28 

7  <  V  < 

20 

V4  >  24 

assign  to  women 

(14) 

V4  <  9 

assign  to  men 

(  1) 

9  <  V4  < 

24 

see  Rule  4 

(25) 

4. 

8  <  v3  < 

28 

7  <  V  < 

20 

9  <  v4  < 

24 

V2  !  21 

assign  to  men 

(1) 

V2  1  5 

assign  to  men 

(1) 

Residual  group:  23  (11  men  +  12  women) 

Dempster  (20)  -  (Data  from  H.  D.  Sylwestrowicz  of  CIBA).  [Table  8} 
This  example  concerns  a  type  of  data  frequently  found  in  pharmaceutical 
experimentation.  Nine  variables  are  measured  on  19  animals.  The  nine 
variables  are  all  measurements  of  renal  blood  pressure,  but  taken  in 
intervals  of  1/2  hour  over  four  hours.  The  animals  had  been  randomly 
divided  into  two  groups  of  sizes  12  and  7.  The  first  group  was  the 
control;  the  second  received  a  specific  drug  treatment  after  the  first 
of  the  nine  measurements  were  taken.  V  may  then  be  considered  a 
covariate.  Kendall's  method  results  in  the  following  (non-unique)  set 
of  rules : 

1.  V  >  7  assign  to  group  1  (control)  (11) 

O 

V  <  -8  assign  to  group  2  (5) 

O 

-8  <  V  <  7  see  Rule  2  (3) 

~  O  “ 
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TABLE  8 

MEASUREMENTS  OF  RENAL  BLOOD  PRESSURE  TAKEN 
AT  ONE-HALF  HOUR  INTERVALS  ON  TREATED 
AND  UNTREATED  ANIMALS. 

DATA  FROM  DEMPSTER  (20) 


Group 

V1 

V 

2 

V3 

V4 

V5 

V6 

v„ 

/ 

00 

> 

V9 

Control 

17 

27 

17 

17 

25 

25 

25 

15 

17 

5 

5 

2 

2 

5 

10 

10 

12 

12 

20 

20 

20 

20 

18 

17 

17 

17 

15 

8 

17 

8 

15 

25 

25 

25 

25 

27 

22 

22 

20 

20 

15 

12 

18 

13 

12 

13 

17 

17 

12 

17 

17 

17 

17 

7 

35 

23 

25 

23 

28 

27 

42 

42 

30 

45 

43 

37 

33 

35 

35 

33 

32 

30 

2 

5 

2 

-  5 

-  7 

-10 

-  8 

-  8 

-18 

33 

37 

22 

28 

32 

30 

30 

27 

28 

25 

35 

22 

28 

28 

30 

28 

25 

22 

32 

47 

48 

47 

47 

47 

47 

48 

47 

Treated 

45 

-  2 

2 

0 

-  5 

-  5 

-10 

-10 

-12 

-  3 

-27 

-30 

-33 

-35 

-35 

-33 

-33 

-33 

32 

17 

12 

12 

7 

2 

2 

7 

7 

30 

-  2 

-10 

-12 

-12 

-12 

-12 

-13 

-13 

13 

-20 

-22 

-22 

-73 

-27 

-27 

-28 

-28 

20 

18 

2 

-13 

-18 

-18 

-22 

-22 

-23 

22 

18 

8 

-  8 

-10 

-  8 

-  7 

-  2 

0 

20 


2.  -8  1  V8  <  7 


>  0 

assign  to  group  2 

(2) 

<  -18 

assign  to  group  1 

(1) 

Residual  group :  0 

Two  points  are  worth  making.  First  it  should  be  noted  that 
would  provide  as  good  a  result  as  V^.  Secondly,  although  no  use  of 
as  a  covariate  was  made,  the  user  of  Kendall's  method  should  realize 
that  a  covariate  could  be  important.  For  example,  a  hypothetical  case 
could  arise  in  which  all  of  the  variable  measurements  overlapped  con¬ 
siderably,  but  the  dif^rence  between  the  subsequent  measurements  and 
the  initial  measurement  was  the  key  to  discrimination.  Kendall's 
method  applied  directly  to  the  data  in  this  case  could  result  in  poor 
results.  Making  the  transformation  of  subtracting  the  initial  measure¬ 
ment  from  the  subsequent  measurements  (for  example)  could  result  in 
improved  discrimination. 

Tintner  (21)  [Table  9]-  This  concerns  the  problem  of  distinguishing 
between  the  prices  of  producers'  goods  and  the  prices  of  consumers' 
goods  on  the  basis  of  certain  measurements  connected  to  their  behavior 
during  a  business  cycle.  The  data  consists  of  the  monthly  wholesale 
prices  of  nine  consumers'  goods  and  ten  producers'  goods  during  the 
period  1860  -  1913.  The  seasonal  and  trend  components  had  been  removed 
by  a  moving  average  method.  is  the  median  length  of  the  cycle  in 
months.  is  the  median  percentage  of  the  duration  of  cyclically  rising 
prices  relative  to  the  total  duration  of  the  cycle.  is  the  median 
cyclical  amplitude  expressed  as  a  percentage  of  the  trend.  is  the 
mean  monthly  rate  of  change  in  the  cycle. 
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TABLE  9 

CYCLICAL  MEASUREMENTS  OF  THE  PRICES  OF  CONSUMERS’ 
AND  PRODUCERS’  GOODS.  DATA  FROM  TINTNER  (21). 


V1 

V2 

V3 

V4 

Consumers ' 

Goods 

Rice 

72 

50 

8 

0.5 

Tea 

66.5 

48 

15 

1.0 

Sugar 

54 

57 

14 

1.0 

Flour 

67 

60 

15 

0.9 

Coffee 

44 

57 

14 

0.3 

Potatoes 

41 

52 

18 

1.9 

Butter 

34.5 

50 

4 

0.5 

Cheese 

34.5 

46 

8.5 

1.0 

Beef 

24 

54 

3 

1.2 

Producers ' 

Goods 

Gasoline 

57 

57 

12.5 

0.9 

Lead 

100 

54 

17 

0.5 

Pig  Iron 

100 

32 

16.5 

0.7 

Copper 

96.5 

65 

20.5 

0.9 

Zinc 

79 

51 

18 

0.9 

Tin 

78.5 

53 

18 

1.2 

Rubber 

48 

50 

21 

1.6 

Quicksilver 

155 

44 

20.5 

1.4 

Copper  Sheets 

84 

64 

13 

0.8 

Iron  Bars 

105 

35 

17 

1.8 

22 


1. 

V  <  48 

assign  to  consumers '  goods 

(5) 

V  >  72 

assign  to  producers'  goods 

(8) 

48  <  vx  <  72 

see  Rule  2 

(6) 

2.  48  <  V1  <  72 


V2 

< 

50 

assign  to  consumers'  goods 

(1) 

V2 

> 

57 

assign  to  consumers'  goods 

(1) 

50 

< 

v2  1  57 

see  Rule  3 

(4) 

3.  48  <  V  <  72 
50  <  v2  <  57 

V  <  12.5  assign  to  consumers*  goods  (1) 

V  >  14  assign  to  producers'  goods  (1) 

O 

12.5  <  V3  <  14  see  Rule  4 

4.  48  <  <  72 

5°  <  V2  <  57 
12.5  <  V  <  14 

“  J  “ 

V4  <  .9  assign  to  producers'  goods  (1) 

V4  >  .9  assign  to  consumers'  goods  (1) 

Residual  group:  0 

Mosteller  and  Tukey  (22)  -  [Table  10].  This  example  concerns  dis¬ 
puted  authorship.  There  are  a  number  of  papers  which  were  written  by 
either  Hamilton  or  Madison,  and  it  is  of  some  interest  to  be  able  to 
determine  the  correct  author.  This  example  is  concerned  with  11  papers 


23 


"and" 

TABLE  10 

RATES  OF  OCCURRENCE  OF  HIGH  FREQUENCY  WORDS 
IN  SOME  OF  THE  WRITINGS  OF  HAMILTON  AND 
MADISON.  DATA  FROM  MOSTELLER  AND 

TUKEY  (22). 

"in"  "of"  "the" 

"to" 

V, 

V„ 

V„ 

v 

V 

1 

2 

3 

4 

5 

Hamilton 

16.1 

35.3 

63.9 

93.3 

38.4 

32.2 

24.5 

78.2 

110.0 

31.4 

24.3 

23.5 

64.7 

90.8 

42.3 

18.0 

27.2 

59.6 

86.8 

35.9 

20.6 

26.9 

61.4 

83.6 

39.5 

21.8 

17.4 

73.1 

90.4 

35.6 

27.9 

23.1 

61.9 

85.4 

41.3 

28.5 

26.1 

71.3 

74.5 

33.3 

28.9 

20.9 

56.9 

82.7 

44.9 

21.3 

25.0 

60.4 

82.2 

47.7 

18.5 

30.7 

72.7 

109.3 

36.6 

Madison 

31.6 

19.9 

54.8 

93.8 

38.6 

37.3 

23.3 

56.8 

84.2 

31.0 

21.2 

17.5 

58.2 

97.6 

39.9 

27.9 

19.1 

55.8 

93.1 

33.5 

40.7 

9.3 

59.0 

71.5 

33.6 

24.4 

27.9 

60.0 

115.3 

34.8 

27.7 

17.7 

61.1 

115.3 

32.7 

28.1 

22.3 

57.0 

110.9 

29.7 

30.6 

23.6 

68.3 

118.6 

23.2 

33.9 

21.8 

64.9 

93.7 

33.6 

23.3 

31.4 

34.8 

94.3 

49.6 

*A11  rates 

of  occurrence  of  high  frequency  words  are  per 

thousand  words 

of  text. 
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mostly  selected  from  the  Federalist  papers,  of  known  authorship*  The 
variables  used  for  discriminating  between  the  authors  of  the  paper  are 
certain  high  frequency  words.  This  particular  example  was  selected  by 
Mosteller  and  Tukey  to  illustrate  the  application  of  the  jackknife 
method  in  discriminant  analysis. 


2. 


3. 


v3 

68.3 

assign  to  Hamilton 

(4) 

V3<- 

56.8 

assign  to  Madison 

(4) 

56.8 

c  V. 

< 

68.3 

see  Rule  2 

(14) 

3 

56.8 

<  V0 

< 

68.3 

3 

V5  > 

39.9 

assign  to  Hamilton 

(4) 

V5  - 

35.9 

assign  to  Madison 

(6) 

35.9 

<  Vc 

< 

39.9 

see  Rule  3 

(4) 

5 

56.8 

<  v. 

< 

68.3 

3 

35.9 

<  vc 

< 

39.9 

5 

— 

V2  >- 

26.9 

assign  to  Hamilton 

(3) 

V2  1 

17. r 

assign  to  Madison 

(1) 

Residual  group:  0 


Beerstecher  et  al  (23)  (Because  of  the  large  amount  of  data  it  is 
not  included  here.)  In  this  study  62  variables  related  to  metabolic 
patterns  were  measured  in  12  individuals  over  a  period  of  one  month. 
This  was  a  preliminary  study  of  the  various  traits  of  an  alcoholic  and 
nonalcoholic  individual,  the  former  during  nondrinking  periods,  in 


order  to  discover  differences  worthy  of  further  study  later  on.  Using 
Kendall's  method  all  12  individuals  were  separated  into  the  correct 
classes  using  two  variates.  The  most  discriminating  variate  was 
hippuric  acid  concentration  in  urine.  Of  the  four  alcoholics,  three 
had  concentrations  above  17  units  and  of  the  eight  nonalcoholics,  seven 
had  concentrations  below  17  units.  The  remaining  two  subjects  were 
correctly  classified  by  a  phagocytic  index,  although  undoubtedly  many 
other  variates  would  have  served  as  well  in  classifying  these  two 
remaining  subjects. 

In  Beerstecher ' s  study  univariate  t-tests  were  used  to  isolate 
important  differentiating  variates  between  the  alcoholics  and  non¬ 
alcoholics.  The  hippuric  acid  urine  concentration  and  the  saliva 
sodium  concentration  had  the  largest  t-values. 

It  is  of  course  not  surprising  that  Kendall's  method  provided 
compatible  results  since  large  t-values  for  a  variate  would  indicate 
wide  separation  between  the  means  of  the  samples  from  the  two  popu¬ 
lations,  causing  it  to  be  selected  as  a  discriminating  variate  by 
Kendall's  method. 

This  particular  set  of  data  was  examined  using  Kendall's  method 
in  order  to  illustrate  a  potentially  valuable  use  of  this  method  as  a 
screening  technique  for  significant  variates.  Further  consideration 
of  this  use  is  given  in  Section  6.1. 


CHAPTER  III 


COMPARISON  OF  KENDALL’S  METHOD  WITH  LINEAR 
DISCRIMINANT  ANALYSIS 

3.1  Method  of  Evaluation 

The  two  assumptions  of  concern  in  linear  discriminant  analysis  are 
equality  of  variance-covariance  matrices  and  multivariate  normality . 
Kendall's  method  is  compared  with  linear  discriminant  analysis  in  cases 
where  all  of  these  assumptions  are  valid  and  in  cases  where  either  or 
both  assumptions  are  invalid. 

Sampling  experiments  were  done  using  the  variance -covariance 
matrices 

=  (l-pjH+PjEpp 

l2  =  o2[(l-p2)I+p2Epp] 

Two  p-variate  normal  populations,  II ^  and  n^, were  considered,  with  means 
p,  and  p  and  variance-covariance  matrices  E  and  E  respectively.  In 

— i  — 2  1  ^ 

the  experiments  pjf  p2>  a2  and  p_  ^were  varied;  p_j  was  always  equal  to 
zero.  The  value  of  p  used  in  all  experiments  was  5. 

The  particular  set  of  parameters  used  are  given  in  Table  11. 

A  CDC  6400  computer  which  has  a  60-bit  word  length  was  used  in  this 
study . 
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TABLE  11 


PARAMETER  VALUES  USED  IN  SAMPLING  EXPERIMENTS 


0*,1»2 


0,1,2 


0,1,2 


0,1,2 


*0'=  (0,0, 0,0,0) 


The  method  used  to  generate  a  sample  vector  with  variance- 


covariance  matrix  £  was  to  generate  the  vector 


x'=  (x1,x2,x3,x4,x5) 


where  x^  are  independent  uniform  (0,1)  variables.  Using  the  inverse  of 


the  normal  probability  integral  4>  1 ,  the  vector 


z.'  =  (yi»y2’y3’y4’y5) 


was  obtained,  where  <J>_1(x^)  =  Thus 


%  MVN(0,I) 


The  variance-covariance  matrix  £^  was  factored  into  the  product  of  a 


lower  triangular  matrix  and  its  transpose  by  a  Crout  factorization  (24). 


Thus  £  is  expressed  in  the  form 


£  =  TT' 


w 
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Multiplying  £  by  T  produced  the  desired  vector 
Ty_  ^  MVN(O.TT’) 

T£  -v  MVN(0,E) 


or 


The  algorithm  for  the  Crout  factorization  is  as  follows: 


'ii  =  ’n’ 


(T  =  (tyjj  £  =  j«tj|) 


(7) 


'll  '  'll  ”11 


l- 2  j3  j • •  •  p 


(8) 


If  the  preceding  columns  k  <  j  have  been  completed,  the  jth  diagonal 
term  is  calculated  by 


j-1 

t.  .  =  (o.  .  -  S  t.  2  ft 
33  13  3k 

k=l 


(9) 


If  j  <  p,  the  elements  below  the  diagonal  are  computed  by  the  formula 


t..  =  t..  1(c.  .  -  Yt.,tJ  k  =  j+1. . . . ,p 

13  33  13  lk  3k  J 


(10) 


k=l 


As  explained  in  the  introduction,  initial  samples  of  size  20 

(sometimes  50  or  100)  from  peculations  H  and  n  were  generated  and 

1  2 

Kendall's  discrimination  rules  were  derived  from  the  initial  sample. 
These  rules  were  then  applied  to  new  samples  of  size  500  each  from  n, 

L 

and  n  .  This  entire  procedure  was  repeated  50  times,  each  time  with 
new  samples.  The  same  procedure,  using  the  same  set  of  random  numbers, 
was  used  with  the  linear  discriminant  function. 

Consider  one  of  the  50  repetitions  of  the  experiment.  Let  F(i.j) 
denote  the  fraction  of  the  initial  sample  from  population  j  which  was 
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classified  as  population  i,  and  F(0,j)  denote  the  fraction  of  the 
initial  sample  from  j  which  was  not  classified.  With  Kendall's  method 
F(l,l),  F(2,2),  F(0,1),  and  F(0,2)  were  calculated.  Since  no  proba¬ 
bilities  of  misclassification  were  allowed,  F(l,2)  =  F(2,l)  =  0.  With 
linear  discriminant  analysis,  F(l,l),  F(2,2),  F(l,2),  and  F(2,l)  were 
calculated  using  the  resubstitution  method.  In  linear  discriminant 
analysis  all  of  the  samples  are  classified,  so  F(0,1)  =  F(0,2)  =  0. 

The  set  of  rules  derived  from  the  initial  sample  was  applied  to  a 
new  sample  of  size  500  from  each  population.  Let  FI(i,j)  (i=0,l,2; 
j=l,2)  denote  the  fraction  of  this  index  sample  classified  as 
indicated.  FI(i,j)  (i=C,i,2  ;j=l,2)  was  calculated  for  Kendall's  class¬ 
ification  rules  and  FI(i,j)  (i=l,2;j=l,2)  was  calculated  for  the  linear 
discriminant  function. 

This  was  done  for  all  50  repetitions  and  the  F's  and  FI's 
averaged  to  give  estimates  of  the  expected  value  of  the  probability  of 
misclassification  (and  classification)  based  on  the  initial  sample  and 
the  index  sample,  respectively.  The  average  value  of  F(i,j)  (i=0,l,2; 
j=l,2)  is  denoted  by  P(i,j);  the  average  value  of  FI(i,j)  (i=0,l,2; 
j=l,2)  is  denoted  by  P*(i,j). 

P(i,j)  (i^j)  is  known  to  underestimate  the  expected  value  of  the 
probability  of  misclassification  of  the  sample  discriminant  function. 

Ir  was  calculated  for  comparison  with  the  Kendall  estimates  from  the 
initial  sample. 

The  P*(i,j)  values  are  of  primary  interest,  since  these  measure 
the  probabilities  of  misclassification  when  the  classification  rule  or 
function  is  used.  The  P(i,j)  values  are  of  interest  since  in  practice 
only  the  same  sample  used  to  derive  the  rule  or  function  is  available 
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to  judge  the  probability  of  misclassif ication  when  the  rule  or  function 
is  applied.  It  is  important  to  determine  the  relationship  between  the 
P(i,j)  and  P*(i,j)  values  so  that  some  judgment  of  the  actual 
performance  of  the  rule  or  function  may  be  ascertained  from  the  per¬ 
formance  with  the  initial  sample. 

In  some  of  the  sampling  experiments  a  principal  components  trans¬ 
formation  was  done.  The  principal  components  were  estimated  from  the 
combined  samples  and  the  transformation  then  applied  separately  to  the 
two  samples.  Kendall’s  method  was  then  applied  to  the  transformed 
data. 

3.2  Testing  the  Random  Number  Generator 

Since  Kendall's  method  is  based  on  the  order-statistics  of  the 
distribution,  it  is  this  aspect  of  the  random  number  generator  which 
should  be  examined  most  carefully.  In  order  to  judge  the  quality  of 
the  random  number  generator,  some  tests  were  performed.  In  the  first 
test  20  N(0,1)  independent  random  numbers  were  generated  and  then 
ordered  from  low  to  high  values.  This  was  repeated  100,000  times,  and 
the  average  value  of  each  order -statistic  calculated.  These  values 
were  compared  with  the  theoretical  value  of  the  order-statistics  as 
tabled  in  Owen  (25).  The  results  are  given  in  Table  12;  the  sampling 
experiment  results  agree  quite  well  with  the  tabled  values.  In  the 
second  test  use  was  made  of  the  results  of  Gupta  (26)  who  has  cal¬ 
culated  the  percentage  points  of  order-statistics  from  the  normal 
distribution.  Again,  a  sample  of  size  20  was  generated  from  an  N(0,1) 
distribution  and  this  time  the  percentage  points  given  by  Gupta  were 
used  to  calculate  the  number  of  samples  which  exceeded  each  percentage 
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TABLE  12 

COMPARISON  OF  TABLED  EXPECTED  VALUES  OF  ORDER  STATISTICS 
FROM  N( 0,1)  DISTRIBUTION  WITH  THOSE  OBTAINED  FROM 
SAMPLING  EXPERIMENT 


Order 

Statistic 

Tabled 

Value 

Sampling 

Experiment 

Results 

Order 

Statistic 

Tabled 

Value 

Sampling 

Experiment 

Results 

1 

-1.8675 

-1.8678 

11 

0.0620 

0.0613 

2 

-1.4076 

-1.4091 

12 

0.1870 

0.1861 

3 

-1.1309 

-1.1314 

13 

0.3149 

0.3148 

4 

-0.9210 

-0.9187 

14 

0.4483 

0.4483 

5 

-0.7454 

-0.7448 

15 

0.5903 

0.5910 

6 

-0.5903 

-0.5891 

16 

0.7454 

0.7455 

7 

-0.4483 

-0.4478 

17 

0.9210 

0.9206 

8 

-0.3149 

-0.3146 

18 

1.1309 

1.1289 

9 

-0.1870 

-0.1872 

19 

1.4076 

1.4074 

10 

-0.0620 

-0.0617 

20 

1.8675 

1.8701 

point.  This  was  done 

10,000  times. 

The  results 

are  given 

in  Table  13. 

Again,  the  two  results  agree  very  well. 


3.3  Comparison  in  the  case  of  multivariate 
normality,  including  consideration  of 
unequal  variance-covariance  matrices 


3.3.1  Summary  and  Analysis  of  Main  Sampling  Experiment  Results 

The  results  of  the  sampling  experiments  are  given  in  Tables  14-18; 

K  denotes  Kendall's  method;  LDA,  linear  discriminant  analysis;  K(PC), 
Kendall's  method  with  principal  components  transformation. 
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The  values  for  a2  =  l,p1  =  .1»P2  =  . 9,  u_  =  0_,  1_,  2_  have  been 
calculated  already  in  Table  16  in  the  entries  o2  =  1,  pj  =  .9,  p2  =  .1, 
u_  =  0_,  _1 ,  2.  The  experiment  has  been  repeated,  however,  to  serve  as  a 
basis  for  judging  the  quality  of  the  estimates.  It  can  be  seen  from 
Table  19  that  the  estimates  are  stable  for  p_  =  1_,  and  y_  =  2_,  but  there 
is  variation  in  the  case  of  zero  mean  differences. 

Tables  20  and  21  summarize  the  probability  of  misclassification 
information  for  the  Kendall  method  and  LDA,  respectively.  P(i,j)  is  the 
probability,  estimated  from  the  initial  sample,  of  assigning  an  observa¬ 
tion  from  the  jth  population  to  the  ith  population.  P^,  is  the  average 
of  P(2,l)  and  P(l,2);  PT(0)  is  the  average  of  P(0,1)  and  P(0,2). 

ft 

PT(i,j)  is  the  probability  of  assigning  an  observation  from  the  jth 
population  to  the  ith  population  when  the  classification  rules  derived 
from  the  initial  sample  are  used.  P*  is  the  average  of  P*(2,l)  and 

ft  2  o 

Pt(1,2).  The  entries  are  in  order  of  increasing  T  values  for  az  =  2, 

2  2 

and  then  for  increasing  T  values  for  a2  =  1,  where  T  ,  the  Mahalanobis 
distance,  is 

(P.  -  JJ  ) '  E_1  (l»  -  jO  (11) 

—1—2  —1—2 

in  the  case  of  equal  variance-covariance  matrices.  For  E^  and  E^  of  the 

.  .  .  2 
form  considered  in  this  study,  the  equation  used  for  calculating  T  will 

be  derived: 


Ej  =  (1  -  Pj)I  +  P jEpp 

E  =  o2[(l  -  p  )I  +  p  EPP] 

2  2  2 

Ej  +  E2  =  [(1  +  o2)  -  (pj  +  P2P2)]I  +  (Pj  +  c2p2)Epp  (12) 


/ «  a 
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Let 


1  +  o2  =  a  pj  +  o2p2  =  b 


Then 

[(a  -  b)I  +  bEpp]'1 
is  determined  using  the  relationship 

[I  +  PQlf1  =  I  -  P(I  +  QP)_1Q 


Thus 

[(a  -  b)I  +  bEpp]'1  =  (a  -  b)-1[I  +  ^^pp] 


(13) 


(14) 


(15) 


(16) 


(17) 


(18) 


Let 


Then 


(a-b)  +  pb 


=  c 


(19) 


Y~  =  ^  "  cEpp^i. 


(20) 


42 


Eu^2  -  c(Ey.)2 
a-b 


(21) 


Using  equation  (19), 


2  =  a-b  ^Eyi2  ~  (a-b)  +  pb  (Eui^ 


(22) 


Finally,  using  equation  (13) 


T2  = 


(1+a2)  -  (pj+c^) 


Ep  2  - 


(p.+o2p  )(Eu. )2 

1  2  x 


(l+o2)  +  (p-1)(p1+o2p2) 


(23) 


With  Kendall's  method  not  all  of  the  index  sample  will  be  classi¬ 
fied.  Thus  P*(0 ,1)  and  P*(0,2)  denotes  the  probability  that  an  obser¬ 
vation  from  IIj  and  n 2 ,  respectively,  will  not  be  classified.  P*(0) 
denotes  the  average  of  P*(0,1)  and  P*(0,2).  These  values  are 
summarized  in  Table  22.  They  must  be  taken  into  consideration  in 
judging  the  performance  of  Kendall's  method. 

The  relative  performance  of  the  Kendall  order-statistic  method 
and  LDA  (Linear  Discriminant  Analysis)  may  be  compared  by  examining 
Table  23.  One  column  of  Table  23  is  identified  by  P*(K)/P*(LDA) .  This 
is  the  ratio  of  the  average  probability  of  misclassification  of  the 
Kendall  method  to  that  of  LDA.  The  comparison  is  somewhat  unfair,  since 
not  all  of  the  index  sample  has  been  classified  in  using  the  Kendall 
method.  A  factor,  f,  has  been  introduced  to  allow  comparison  of  the 
probabilities  of  misclassification  only  with  respect  to  the  portion  of 
the  sample  actually  classified.  The  factor  f  is  the  reciprocal  of  the 
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TABLE  23 


COMPARISON  OF  THE  PERFORMANCE  OF  KENDALL'S  METHOD 
AND  LINEAR  DISCRIMINANT  ANALYSIS  WITH  RESPECT 
TO  THE  PROBABILITIES  OF  MISCLASSIFICATION 


P*(K) 

Pf<K) 

f  m  - 

Case 

Pi 

fa 

a2 

u_ 

P*(LDA) 

P*(LDA) 

1 

.1 

.9 

2 

1 

.737 

.957 

2 

-.1 

.9 

2 

1 

.752 

.941 

3 

.5 

.5 

2 

1 

.712 

.879 

4 

.9 

-1 

2 

1 

.763 

.913 

5 

.1 

.1 

2 

1 

.956 

1.08 

6 

.1 

.9 

2 

2 

1.06 

1.12 

7 

-.1 

.9 

2 

2_ 

1.15 

1.19 

8 

.5 

.5 

2 

2 

1.04 

1.08 

9 

.9 

.1 

2 

2 

1.03 

1.07 

10 

.1 

.1 

2 

2 

1.74 

1.78 

11 

.5 

.5 

1 

1 

0.663 

.835 

12 

.9 

.1 

1 

1 

0.780 

.828 

13 

.1 

.1 

1 

1 

1.05 

1.14 

14 

.5 

.5 

1 

2 

1.02 

1.04 

15 

.9 

.1 

1 

2 

1.11 

1.17 

16 

.1 

.1 

1 

2_ 

2.81 

2.86 
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fraction  of  the  index  sample  in  Kendall's  method  which  was  classified. 
One  of  the  columns,  identified  by  f *[P*(K)/P*(LDA] ,  lists  this  ratio. 

Considering  the  ratio  P*(K)/P*(LDA) ,  in  cases  1-6,  11  and  12, 
Kendall's  method  has  a  smaller  P^,  value  than  LDA.  In  cases  7-9,  and 
13-15,  the  two  methods  perform  about  the  same.  The  Kendall  method  is 
definitely  worse  in  cases  10  and  16.  These  are  the  two  case's  with  the 

O 

largest  values,  and  with  equal  values  of  and  p2*  Considering  the 
ratio  f  •[Pfj,(K)/P^(LDA)3,  similar  results  are  obtained  for  all  cases 
except  for  case  6  for  which  the  Kendall  method  is  better  and  cases  10 
and  16  for  which  LDA  is  better. 

3.2.2  Effect  of  Unequal  Mean  Components 

In  one  sampling  experiment,  the  Mahalanobis  distance  was  kept 

constant  and  the  components  of  the  mean  vector  were  varied.  More 

specifically,  in  one  sampling  experiment  already  considered, 

P.  =  .1,  p  =  .9,  o2  =  2,  y'  =  (2, 2 ,2, 2, 2),  the  Mahalanobis  distance  was 
l  2  — 

3.78.  Components  of  the  mean  vector  were  chosen  to  be  (0,0,l,l,x), 
where  x  was  such  that  the  distance  was  unchanged.  The  value  of  x  was 

ft 

found  to  be  1.512.  Examining  Table  24  shows,  as  expected,  that  P^  does 
not  change  for  linear  discriminant  analysis.  However,  P^  decreases 
substantially  for  Kendall's  method,  and  this  is  only  slightly  indicated 
by  the  decrease  in  P(l,l)  and  P(2,2).  The  point  of  this  single  example 
is  that  the  error  rates  can  be  strongly  influenced  by  changes  in  the 
mean  vector,  even  when  the  Mahalanobis  distance  is  unchanged.  To  some 
extent  this  would  be  anticipated,  since,  as  already  noted,  Kendall's 
method  depends  on  the  overlap  of  distributions  more  than  on  the 
distances  between  the  means  of  the  distributions. 


TABLE  24 
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3.3.3  Effect  of  Allowed  Probabilities  of  Misclassification 

One  problem  with  Kendall’s  method  is  that,  especially  with  distri¬ 
butions  of  infinite  range,  one  or  more  extreme  valued  observations  from 
one  distribution  often  will  be  mixed  with  the  second  distribution  values 
and  little  separation  of  samples  from  the  two  distributions  will  be 
possible.  Accepting  these  few  outlying  observations  as  allowed 
misclassifications,  eliminating  them  from  further  consideration,  and 
proceding  with  Kendall's  method  can  produce  a  set  of  discrimination 
rules  which  result  in  an  increase  in  the  probabilities  of  classification 
in  the  index  sample  with  only  a  limited  increase  in  the  probabilities  of 
misclassification.  That  is,  many  of  the  observations  unclassified 
before  will  now  be  classified,  and  most  of  these  correctly. 

There  are  many  ways  to  introduce  an  allowed  probability  of 
misclassification.  For  example,  a  cumulative  allowed  probability  of 
misclassification  for  each  group  could  be  specified-  However,  there  is 
a  problem  in  allocation.  Suppose  that  for  the  samples  of  20  from  each 
of  the  two  populations  an  allowed  probability  of  misclassification  of 
0.1  is  specified.  Suppose  also  that  low  values  of  the  first  selected 
variate  favor  one  population  and  high  values  those  of  another 
population.  Then  the  cumulative  allowed  probability  of  misclassifi¬ 
cation  could  be  used  immediately,  allowing  two  misclassifications  for 
lew  values  of  the  variate  and  two  misci assif ications  for  high  values 
of  the  variate.  However,  better  overall  results  may  be  obtained  by 
using  fewer  than  these  four  allowed  misclassifications  for  the  first 
selected  variate,  using  some  of  these  for  increasing  the  classification 
quality  of  the  next  or  subsequent  variate.  The  computer  logic  for 
allocating  the  four  allowed  misclassifications  so  as  to  maximize  the 
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overall  performance  of  the  method  could  be  worked  out  relatively  easily 
and  some  consideration  may  be  given  to  this  at  a  later  time.  There 
would,  of  course,  be  a  considerable  increase  in  the  computing  time. 

For  the  particular  set  of  experiments  reported  here,  a  very  simple  mile 
was  used,  one  which  is  feu?  from  optimal.  An  allowed  probability  of  0.05 
was  allowed  for  misclassifications  for  low  values  of  the  first  selected 
variate  and  0.05  for  the  high  values  of  this  variate.  If  the  selected 
variate  was  such  that  samples  from  one  population  had  low  values  of  this 
variate  and  samples  from  the  other  population  had  high  values  of  this 
variate,  then  the  allowed  probability  of  misclassification  for  each 
group  would  be  0.05.  However,  if  both  high  and  low  values  of  a  variate 
were  characteristic  of  a  single  population  then  the  allowed  probability 
of  misclassification  would  in  effect  be  0.1,  whereas  there  would  be  no 
allowed  misclassification  for  the  other  group. 

The  effect  of  the  allowed  probability  of  misclassification  was 
considered  in  several  sampling  experiments  and  the  results  are 
summarized  in  Table  25.  Kendall's  method  is  denoted  by  K,  and  Kendall's 
method  with  an  allowed  probability  of  misclassification  is  denoted  by 
K(PA).  The  effect  of  increased  sample  size  on  the  index  sample 
probabilities  obtained  by  Kendall's  method  can  be  seen  for  samples  of 
size  20,  50,  and  100  for  the  case  o2  =  1,  =  1,  and  Pj  =  p2  =  .5,  and 

for  20  and  100  in  the  case  a2  =  2 ,  u_  =  2,  pj  =  .9,  P2  =  .1.  Due  to 
increased  mixing  of  the  distributions  with  the  larger  sample  sizes,  the 
portion  of  the  sample  classified  decreases  greatly,  but  with  the 
compensation  of  reduced  misclassifications  of  the  index  sample.  If  the 
primary  consideration  is  with  minimizing  the  misclassifications,  this 
effect  is  of  no  concern.  However,  if  a  substantial  portion  of  both 
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sample?  must  be  classified,  even  at  the  risk  of  increased  misclassifi- 
cation,  then  a  trade-off  procedure  between  the  portion  of  the  sample 
classified  and  the  portion  of  the  sample  misclassified  is  necessary.  A 
simple  method  for  controlling  the  probability  of  misclassification 
while  trying  to  increase  the  sample  classified  was  explained  above. 

From  Table  25  it  can  be  seen  that  in  the  case  Pj  =  p2  =  .5,  the  effect 
of  the  allowed  probability  of  misclassification  is  not  too  impressive, 
but  in  the  case  Pj  =  .9,  P2  =  .1  with  a  sample  size  of  100,  P*(l,l) 
increases  from  .347  to  .733  with  relatively  minor  increases  in  the 
probabilities  of  misclassification. 

Examining  Tables  16,  17,  and  18,  the  effect  of  the  principal 
components  transformation  used  with  Kendall's  method  may  be  evaluated 
with  respect  to  the  index  sample.  When  =  .9,  p2  =  .1,  the  transfor¬ 
mation  is  of  value  when  M_  =  £•  In  this  case  P*(l,l)  =  .246  and 
P*(2,2)  =  .458  when  no  transformation  is  used,  and  P*(l,l)  =  .546  and 
P*(2,2)  =  .754  when  the  principal  components  transformation  is  used. 
P*(l,l)  and  P*(2,2)  are  not  increased  significantly  when  Pj  =  .9, 

P2  =  .1,  and  1  £.  When  Pj  =  -1,  P2  =  <9,  the  only  case  for  which  the 
principal  components  transformation  gives  an  improvement  is  when  o2  =  2, 
=  0.  In  the  remainder  of  the  cases  it  reduces  the  performance 
considerably.  When  Pj  =  -.1,  p2  =  .9,  the  only  case  in  which  the 
transformation  improves  the  performance  of  Kendall's  method  is  for 
o2  =  2,  £  =  £.  It  is  worth  noting  that  when  the  principal  components 
transformation  is  used,  almost  all  of. the  initial  sample  is  correctly 
assigned,  but  the  performance  with  the  index  sample  tends  to  be  very 
poor.  For  example,  in  the  case  pj  =  .1,  p2  =  .9,  a2  =  2,  ^  =  1_, 

P(l,l)  =  .954,  P(2,2)  =  .885,  but  P*(l,l)  =  .617,  and  P*(2,2)  =  .645. 
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3.4  Comparison  in  Cases  When  Multivariate  Normality  Does  Not  Apply 

The  analysis  so  far  has  allowed  the  possibility  of  unequal 
variance-covariance  matrices  but  multivariate  normality  still  has  been 
assumed .  Some  sampling  experiments  have  been  done  with  non-normal 
distributions.  The  distributions  considered,  the  Cauchy  and  the 
uniform,  were  selected  because  of  the  difficulty  of  distinguishing 
between  these  distributions  and  the  normal  on  the  basis  of  a  small 
sampling  from  the  distribution. 

In  the  Cauchy  sampling  experiments,  samples  were  selected  from 
each  of  two  populations,  IT j  and  II2: 

nx :  x*  =  (x^,  x2,  x3,  x^ ,  Xj_).  All  x^'s  are  independent  and 

distributed  as  Cauchy  random  variables  (II-1  (1  +  x?) 

n2:  Z  (Pxl)  ^  MVN(2_,I),  where  2_'  =  (2,  2,  2,  2,  2) 

In  the  uniform  sampling  experiments,  samples  were  obtained  from 
each  of  two  populations  II  ^  and  II 2 : 

Hj  ;  x'  =  (X]L,  x2,  Xg,  x4,  x^.).  All  x^'s  are  independent  and 
distributed  as  uniform,  U(-l,  1)  random  variables 

n2:  Z  %  MVN(u,I).  u_  =  0_,  1_,  2. 

In  one  experiment  II  was  as  above,  but 

^2 :  z'  =  y2»  y3>  y4»  y5)»  All  yi,s  are  independent  and 

distributed  as  U(0,  2)  random  variables. 


The  sampling  experiments  are  described  in  more  detail  in  Table  26. 
The  results  of  using  Kendall's  method  are  given  in  Table  27,  and  the 
results  of  using  LDA  in  Table  28.  A  comparison  of  the  results  is  given 
in  Table  29.  In  experiments  1,  2,  and  3,  II  j  is  Cauchy  and  II 2  is 
multivariate  normal.  In  experiment  1  there  is  a  relatively  large 
separation  between  the  means.  Kendall's  method  and  LDA  give  comparable 
P*  values,  but  LDA  gives  the  rather  large  P*(2,l)  value  of  0.31.  In 
experiment  two  the  sample  size  is  increased  from  20  to  100.  Now  only 
2%  of  population  two  is  classified  using  Kendall's  method  since  with  the 
larger  sample  size  there  is  more  mixing  of  the  samples  from  the  two 
populations.  In  experiment  three  there  is  an  allowed  probability  of 
misclassification,  PA.^  =  PA2  =  0.05,  and  more  of  the  sample  can  be 
assigned. 

Experiment  one  can  serve  as  an  example  where  Kendall's  method 
would  be  preferred  to  LDA  if  the  maximum  probability  of  misclassification 
were  of  concern. 

In  experiments  4,  5,  and  6,  Il1  is  U(-l,l),  and  II2  is  multivariate 
normal.  In  experiment  four,  LDA  is,  of  course,  not  applicable,  since 
there  is  zero  mean  differences  between  the  populations.  However, 
experiment  four  would  indicate  the  superiority  of  Kendall's  method  for 
small  mean  differences.  In  experiments  five  and  six  the  mean 
differences  become  increasingly  large  and  LDA  performs  better.  In 
experiment  seven  JI^  is  U(-l,l)  and  II  is  U(0,2).  Both  Kendall's  method 
and  LDA  perform  well,  but  LDA  gives  the  lower  probability  of 
misclassification. 

All  of  the  distributions  in  the  examples  considered  so  far  he.ve 


been  symmetrical.  Two  sampling  experiments  were  done  using 
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TABLE  26 

DESCRIPTION  OF  SAMPLING  EXPERIMENTS 
FOR  NON-NORMAL  DISTRIBUTIONS 

C:  Cauchy;  U:  Uniform;  n^,  n2: 
sample  sizes;  PA^,  PA2:  allowed 
probabilities  of  misclassification 

Experiment 

Number  Description 

1  JIj  =  C,  n2  =  MVN(2,I),  n1  =  n2  =  20 

2  n2  *  C,  n2  =  MVN(2,I),  n1  =  n2  =  100 

3  =  C,  n2  =  MVN(2^I),  nx  =  n2  =  100;  PAX  =  PA?  =  .05 

4  II  *  U(-l,l),  n2  =  MVN(0,I),  n1  =  n2  =  20 

5  =  U(-l,l),  n2  =  MVN(1,I),  nx  =  n2  =  20 

6  JIj  =  U(-l,l),  n2  =  MVN(2.,I),  nx  =  n2  =  20 

7  =  U(-l,l),  n2  =  U(0,2) ,  nx  =  n2  =  20 
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TABLE  29 


COMPARISON  OF  THE  RESULTS  OF  SAMPLING  EXPERIMENTS 
WITH  CAUCHY  AND  UNIFORM  DISTRIBUTIONS  USING 
KENDALL'S  METHOD  AND  LINEAR  DISCRIMINANT 
ANALYSIS 


Experiment 

Number 

Method 

P* ( 2 , 1 ) 

P*(l ,2) 

P* 

T 

1 

K 

.148 

.137 

.143 

LDA 

.311 

.047 

.179 

2 

K 

.003 

.020 

.012 

LDA 

.297 

.046 

.172 

3 

K 

.027 

.080 

.049 

4 

K 

.422 

.201 

.312 

LDA 

.422 

.552 

.487 

5 

K 

.132 

.197 

.165 

LDA 

.045 

.164 

.105 

6 

K 

.055 

.089 

.072 

LDA 

.000 

.020 

.010 

7 

K 

.123 

.121 

.122 

LDA 

.036 

.031 

.034 

distributions  which  were  not  symmetric.  In  experiments  eight  and  nine, 
the  populations  were  as  follows: 


II  j :  jt'  =  (x^>x2  ,x^  ,x^) :  All  x^'s  are  independent, 

lognormal ly  distributed,  In  x.  ~  N(0,1). 


56 

:  Same  as  FI^  except  the  lognormal  distribution  is  shifted  by 

v,  where  u  =  1,  2  in  experiments  eight  and  nine, 
respectively. 

The  results,  summarized  in  Table  30  indicate  the  distinct  super¬ 
iority  of  Kendall's  method  in  experiment  eight.  The  two  methods 
perform  comparably  in  experiment  nine  when  there  is  larger  mean 
difference  between  the  two  populations. 


TABLE  30 
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CHAPTER  IV 


COMPARISON  OF  THE  PROBABILITIES  OF  MISCLASSIFICATION 
FOR  THE  LINEAR  DISCRIMINANT  FUNCTION  DETERMINED 
FROM  THE  SAMPLING  EXPERIMENTS  WITH  THE 
THEORETICAL  VALUES  OBTAINED  BY 
1)  GILBERT,  2)  OKAMOTO,  AND 
3)  LACHENBRUCH. 

4.1  Comparison  with  Gilbert's  Results 

Gilbert  (7)  has  considered  the  effect  of  unequal  variance- 

covariance  matrices  on  Fisher's  linear  discriminant  function.  She  has 

calculated  the  probability  of  misclassif ication  IIP(2,1)  +  (1-II)P(1,2)  as 

a  function  of  II  (the  a  priori  probability  that  a  sample  comes  from 
2 

population  1),  T  ,  and  d  in  the  case  I2  =  dEj.  No  work  was  done, 
however,  in  determining  the  probability  of  misclassif ication  when  the 
population  parameters  are  estimated.  The  results  of  some  of  the 
sampling  experiments  already  considered  in  this  thesis  may  be  used  to 
provide  the  expected  value  of  the  probability  of  misclassification  when 
the  population  parameters  are  estimated. 

Consider  two  populations,  Xj  ^  MVNC^,^)  and  x^  ^  MVN(p_  jdE^). 
Choosing  an  orthogonal  matrix  P  such  that  PEP'  =  I  and  using  the 
transformation  =  P(x^  -  y^),  the  distributions  may  be  expressed  in 
the  canonical  form  x_  ^  MVN(0_,I),  y_2  'v  MVN(v_,D),  where 
D  =  Diag(d,d, . . . ,d).  The  total  probability  of  misclassification  is 
minimized  by  the  rule  which  assigns  an  observation  to  population  two 
whenever  log  [(1  -  II)f2(x)/IIf  (x)J  >  0  and  to  population  one  otherwise. 
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Using  the  linear  discriminant  function,  an  observation  is  classified  as 
population  two  whenever 


The  cutoff  point  C  is  chosen  to  minimize  the  total  probability  of 
misclassification.  Gilbert  finds  that 
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c  =  rMl2 1  ^  [t2  *  n  to  -W  (los  d  +  2  log  rnr))’s| 


when  d  ^  1 


c  =  jT2  +  log  [n/(i  -  n)] 


when  d  =  1 


Gilbert  notes  that  the  only  instance  when  the  optimal  cutoff  point  is 
not  given  by  equation  (30)  or  (31)  is  when  assigning  all  the  obser¬ 
vations  to  the  same  population  yields  a  lower  total  probability  of 
misclassif ication. 

For  the  case  in  which  we  are  interested. 


=  (1  -  p)I  +  pEpp 


12  =  a2[(l  -  p)I  +  PEpp3 
2 

and  d  =  o2 .  T  has  been  previously  calculated  when  'v  MVN(0,Ij), 

2 

x2  MVN(£2,12),  using  equation  (23).  Since  T  is  invariant  under 
linear  transformations,  this  value  may  be  used  in  equations  (30)  and 
(31)  to  calculate  P(2,l)  and  P(l,2).  For  the  analysis  in  this  thesis, 
II  =  y  always,  so 


P(2,l)  =  1  -  $[([1  +  a2]/2T2)?5c] 

P(1 ,2)  =  $[([1  +  o2]/2o2T2)'5(C  -  T2)] 
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where  now 


2 

T  ±  a  T 


_2  2 (a2 
T  +  — 5 - 


1) 


1  +  a4 


a2  ?  1 


(34) 


C  =  ir2  o2  =  1  (35) 

Gilbert  tables  the  values  of  the  total  probability  of  misclassi- 

2 

fication  for  selected  values  of  II,  T  ,  and  d.  The  particular  values  of 

2 

u_  chosen  in  this  study  resulted  in  T  values  which  were  not  tabled. 
Equations  (32)  and  (33)  were  used  to  calculate  P(2,l)  and  P(l,2)  and 
the  probability  of  misclassification, 

?T  =  |(P(2»1)  ♦  P^»2>  |  •  (36) 

The  results  of  these  calculations  are  given  in  Table  31. 

In  Table  32  the  theoretical  values  of  the  probabilities  of 
misclassification  for  linear  discriminant  analysis  are  compared  with 
the  sampling  experiment  results. 

From  Table  32  it  is  seen  that,  as  expected,  P^.  underestimates  P^., 
and  in  most  cases  P(2,l)  underestimates  P(2,l),  and  P(.l,2)  under¬ 
estimates  P(l,2).  It  should  also  be  noted  that  P^,  is  a  poor  estimate 
of  P*  except  in  the  cases  when  the  sample  size  is  100  (denoted  by  a 
prime  in  Table  32). 


fi.- 


TABLE  31 
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The  prime  denotes  a  repetition  of  the  experiment  with  a  sample  size  of  100. 


4.2  Comparison  of  the  sampling  experiment 
results  in  the  cases  Ej  =  E2  with  the  results 
of  Okamoto  and  with  the  results  of  Lachenbruch 

It  is  interesting  to  compare  the  P*  values  with  those  which  may 
be  obtained  in  a  different  way.  Okamoto  (27)  has  derived  an  asymptotic 
expansion  for  the  distribution  of  the  linear  discriminant  function 
statistic  W  (the  distribution  of  the  sample  discriminant  function).  In 
the  particular  cases  of  interest  in  this  thesis,  the  expansion  is 


Pr 


W  <  0 


n 


+  +  ^22  +  ^12  +  ^13  +  ^23  +  ^33  + 

r]  n2  nln2  nln  n2n  °2 


(37) 


where  n^  and  n2  are  the  sizes  of  samples  from  populations  II j  and  II2, 

respectively,  and  n.^  +  n2  -  2  =  n.  In  Table  1  of  his  paper,  Okamoto 

gives  the  values  of  the  coefficients  for  a  number  of  values  of  p, 

including  the  case  of  interest  to  us,  p  =  5,  and  for  a  number  of  T 

values,  T  =  1,  2,  3,  4,  6,  8.  The  particular  T  values  in  our  case  are 

not  included  in  the  Table,  however.  In  order  to  obtain  a  highly 

accurate  result,  the  coefficients  should  be  calculated  for  these 

particular  T  values.  Okamoto' s  expansion  has  been  applied  to  the 

cases  in  Table  32  using,  however,  Okamoto 's  tabled  coefficients  for  the 

T  case  closest  to  our  particular  T  values.  The  results  of  the  calcu- 

* 

lations  are  given  in  Table  33.  It  can  be  seen  that  the  PT  values 

it 

obtained  in  the  sampling  experiments  and  the  P^,  values  calculated 
using  Okamoto' s  asymptotic  expansion  are  very  close. 


TABLE  33 


COMPARISON  OF  P*  VALUES 


Experiment 

Number 

P* 

T 

Sampling 

Experiment 

p* 

T 

Okamoto 

Lachenbruch 

11 

.300 

.310 

.302 

13 

.203 

.198 

.209 

13’ 

.177 

.180 

.181 

14 

.122 

.125 

.127 

16 

.042 

.037 

.045 

Lachenbruch 

(4)  provides  another 

way  to  calculate  P^,. 

He  con- 

siders  the  sample  discriminant  function 

Dg(x)  =  [x  -  ^(x^  +  x2)]'S  1(x1  -  x2)  (38) 


which  is  conditionally  (on  x^,  x2,  and  S)  normally  distributed  and  has 
mean  (in  the  kth  group) 


DS^^  ~  ~  2^—1  +  — 2^'S  ^—l  —2^ 


(39) 


and  variance  (in  either  group) 
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He  finds  that 


nl  +  n2  -  2  /  2  k+1  p(n2  "  V 

E[Dc(u,  )]  =  - 5T(T2(-l)k+1  •  2  1 


V^k'J  '  2(nx  +  n2  -  p  -  3) 


nln2 


(41) 


ECVD] 


p(n  +  n  ) 

T  + 


nin2 


^nl  +  n2  "  3^ni  +  n2  ~  2) 


<n]  +  n2  “  ?  ~  2 )  (n^  +  n2  -  p  -  3)^  +  n2 


-  p  -  5) 


(42) 


For  and  n2  sufficiently  large,  the  unconditional  distribution  is 
very  close  to  normal,  and 


P-L  =  *  j  eC-ds(*£1)3/  [e(vd)D?1| 


(43) 


and  P2  =  ®  |e[Ds(w2)D/  [E(VD)T2j 


(44) 


will  supply  approximate  values  for  and  F2»  where 


?1  =  P(Dg(x)  <  0  (xeRj) 
P2  =  P(Dg(x)  >  o  |xcn2) 


(45) 

(46) 


'ft 

Equations  (41),  (42),  (43),  and  (44)  were  used  to  calculate  P^  by 
Lachenbruch * s  method.  The  results  are  given  in  Table  33.  The  methods 
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CHAPTER  V 


THE  MODIFIED  BARTLETT  AND  PLEASE  METHOD 

5.1  Introduction 

Bartlett  and  Please  have  obtained  a  linear  discriminant  function 
in  the  case  of  zero  mean  differences  when  the  variance-covariance 
matrices  are  of  the  form  considered  in  this  thesis,  i.e. , 

Ij  =  (1  -  p x )I  +  PjEpp 


E2  =  o2[(l  -  p2)I  +  P2Epp] 


In  the  case  of  zero  mean  differences  it  is  of  interest  to  compare 
Kendall's  order-statistic  method  or  variation  thereof,  with  this 
Bartlett  and  Please  method.  However,  the  cutoff  point  which  they 
obtain  for  equal  probabilities  of  misclassification  is  shown  to  be 
incorrect.  This  chapter  will  be  concerned  with  the  development  of  a 
modified  Bartlett  and  Please  method  which  does  provide  equal  prob¬ 
abilities  of  misclassification.  In  section  5.4  this  modified  Bartlett 
and  Please  method  will  be  compared  with  Kendall's  method. 

5.2  Derivation  of  the  Modified  Bartlett  and  Please  Method 
Let  us  consider  two  populations 

68 
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f.(x)  ^MVN(0,I.) 


(i  =  1,2) 


where 


=o?  (1  -  p^)I  +  p.Epp  (i  =  1,2) 


A  discriminant  function  may  be  derived  by  considering  the  likelihood 
ratio  principle: 


If  f1(x)/f2(x)  >  A 


assign  to  11^ 


Expanding  the  ratio,  using  the  logarithmic  transformation. 


exp[-%x'Ej  1x]  j exp  [  -%x '  1 2  ^x] 

(2n)p/2|ii|?5  /  (2n)p/2|E2|5s 


i  ,r-i 

Pf  ’  2^El  5. 


in  — i_  +  i*' E„1x  >  lnX 


ln|Ej  +  x'E^x^  -  In  I  ^ 2 1  “ 


'  E_1x  <  - 


When  E  =  E  it  is  a  well-known  result  that  X  =  1  provides  equal 
1  2 

probabilities  of  misclassification.  Bartlett  and  Please  have  inad¬ 
vertently  assumed  that  X  =  1  also  provides  equal  probabilities  of 
misclassification  when  E^  f  1^.  When  X  =  1, 
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x'l^x  -  <  -inlzj  +  m|Z2J  =  m(|z2|/|z1|) 


In  the  particular  case  that  we  are  considering,  cr2  =  1,  a2  =  a2,  so 


x/Zj^x  -  x_'S2^j{  <  p  In  a2  assign  to  II j 


The  discriminant  function  equation  (53)  may  be  written  in  the  form 


az^  -  bz^  =  K* 


where  z,  =  x'x  z  =  (E.  x)  E  =  (1,1, ...,1) 

1 -  Z  Xp —  Xp 


1  ~  P1  cr2(l  -  P2) 


1  -  Pl  1  +  (P  -  DPj  "  02(1  _  p2>  1  +  (P  ~  DP2 


When  Pj  =  P2  =  p,  the  discriminant  function  becomes 


7  .  P _ rj  _  (1  -  p)p  In  o2 

*1  1  +  (P  ~  1)P  2  ~  x  _  1_ 

cr2 


As  noted  earlier  the  cutoff  point  given  by  the  right  side  of  equation 
(57)  does  not  provide  equal  probabilities  of  misclassification.  Using 
this  cutoff  point,  however,  the  Bartlett  and  Please  rule  is 
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t  £:  rr  P  „  ,  (1  -  P)P  In  O2  _  .  ,  „ 

If  zi  ~  i  +  (P  -  iip  z2  -  — —  gn  to  ni 


T  a  rj  P  „  .  (1  -  d)p  111  O2  _  .  „ 

If  zi  -  —  Dp  z2  "  '  ±  _  i_ -  assign  to  n2 


A  method  for  obtaining  the  cutoff  point  which  does  provide  equal 


probabilities  of  misclassification  now  will  be  developed.  It  may  be 


shown  that 


U  =  Z  - 


^  o?(l 


1  1  +  (p  -  l)p  2  i 


Hence  the  rule  becomes 


Assign  x  to  I!  when  U  <  K 


Assign  x^  to  II  when  U  >  K 


with  K  to  be  determined  so  that  there  are  equal  probabilities  of 


misclassification.  The  probabilities  of  misclassification  are: 


ot i  =  f  f2(x)dx;  a2  =  J 


Oj  =  1,  =  a2,  these  become 
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K/l-p  00 

=  10 p(co)du)  and  ao  =  /0p(w)duj, 

'  o  *  K/o2(l-p) 


respectively,  where  0p  ^  x2(p)- 

The  correct  procedure  for  equal  probabilities  uf  misclassifi- 
cation  would  be  to  choose  that  value  of  K  which  makes  the  two  integrals 
in  ( 64 )  equal . 

Now  the  function  az^  -  bz^  may  be  written  as 


i(l  -  p^cr?  Xp_2  +  (a  -  bp)o?  j  1  +  (p  -  DpJx2 


and  therefore. 


E(w)  =  a(l  -  ci)a?(p  -  1)  +  (a  -  bp)c?  |  1  +  Cp  -  l)p-  J  =  f ^  Cf 
V(u>)  =  2a2 (1  -  P.)2oJ( p  -  1)  +  2(a  -  bp)2c^  [  1  +  (p  -  l)p.  J  2  =  2g. 


f  .  0) 

1  ? 
- —  X 

i  -y ; 


f0K/g, 


I  r  ( 'ii  k(x  comes  from  H  ) 


=  J?n2(C)cl: 
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a  =  Pr(u>  >  K  x  comes  from  IT 


oo 

,) .  K 


(S)dC 


flK/gl 


(69) 


The  value  of  K  is  to  be  chosen  so  that  the  two  integrals  in  (68) 
and  (69)  are  equal.  The  procedure  used  to  find  the  value  of  K  was  a 
Newton -Raphson  method.  To  solve  the  equation  H(x)  =  0,  a  sequence  of 
x  values  are  calculated : 


H  (x.) 

Xi+1  =  Xi  “  H'(x.) 

1 


(70) 


When  the  difference  between  x^+^  and  x^  becomes  acceptably  small, 

the  x.  _  th  value  will  be  the  solution  of  the  equation.  Here 
l+l 


f//g2 

H(K)  =  /0n2U)dC 


< 

I: 


0n1(C)d5 

flK/gl 


(71) 


dH(K) 

dK 


h(K) 


(72) 


The  parameters  are  estimated  by  the  following  procedure: 


n . 

i 

Let  X  Z2r  =  Ai  (i  "  1,2) 

r=l 


(73) 


I 

I 
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Lachenbruch  (  2)  has  developed  an  excellent  method  for 
determining  the  probabilities  of  misclassification  in  linear  discrim¬ 
inant  analysis.  In  this  method,  one  observation  is  omitted  from  the 
sample  used  to  calculate  the  discriminant  function  and  the  discriminant 
function  is  then  used  to  classify  the  omitted  observation.  This  is 
done  in  turn  for  each  observation.  The  number  of  misclassifications 
provides  a  good  indication  of  the  probability  of  misclassification. 
Lachenbruch ' s  method  provides  a  much  better  estimate  of  the  probability 
of  misclassification  than  is  provided  by  the  resubstitution  method,  in 
which  the  entire  sample  is  used  to  calculate  the  discriminant  function 
and  this  function  is  in  turn  used  to  classify  each  of  the  observations. 
A  method  related  to  that  of  Lachenbruch  is  developed  in  this  thesis 
and  this  method  is  compared  with  the  resubstitution  method.  It  is  not 
strictly  a  Lachenbruch  method  since  the  cutoff  points  for  equal 
probabilities  of  misclassification  are  not  recalculated  each  time  an 
observation  is  omitted  from  the  sample.  Let 

g  =  a  Z.  -  b  Z„  (83) 

r  r  lr  r  2r 

be  the  estimated  function  omitting  the  rth  observation  from  U^, 
where 


a 

r 


(n. 


l)(p  -  1) 


(B, 


Z3r> 


n2(p  -  1) 


B 


2 


(84) 
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were  15  samples  each  from  the  monozygotic  and  dizygotic  male  twins  and 
15  samples  each  from  the  monozygotic  and  dizygotic  female  twins. 

The  values  of  o2  and  p^,  estimated  from  the  samples  were  as 
follows : 


Case  I 

(Females) 

a2  =  3.760 

p 1  =  P2  =  0.160 

Case  II 

(Males) 

a2  =  2.236 

Pj  =  P2  =  0.223 

These  two  cases  considered  by  Bartlett  and  Please  were  reexamined. 
Random  samples  with  parameters  equal  to  the  estimated  parameters  of 
Bartlett  and  Please  were  generated.  Experiments  with  total  sample 
sizes  of  30  and  200  were  performed  under  a  variety  of  conditions.  All 
experiments  were  repeated  10  times  for  each  set  of  parameters  and  set 
of  conditions.  The  two  general  cases  considered  then  were 

Case  I:  a2  =  1  a2  =  3.76 

1  2 

pl  =  p2  =  0.160  p  =  10 

Case  II:  a2  =  1  a2  =  2.236 

Pj  =  p2  =  0.223  p  =  10 

Solving  the  equation  H(K)  =  0  yielded  the  following  values  of  K 
with  corresponding  values  of  a. 


I 

I 


Case 

I: 

K  -  12.38,  a 

=  a2  =  0.077 

Case 

II: 

K  =  8.03,  a 

=  a  =  0.176 
2 
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where  K  is  the  cutoff  point  for  the  discriminant  function 


aZ1  -  bZ2  =  K 


(56). 


Values  of  a  and  b  were  calculated  from  Equations  (55)  and 

Case  I:  a  =  0.87386 

b  =  0.05730 


Case  II:  a  =  0.74212 

b  =  0.05504 


The  cutoff  point,  K*,  in  the  Bartlett  and  Please  discriminant 
function 


aZ1  -  bZ2  =  K* 


is  equal  to  p*loge  a2,  so  for  Case  I  it  is  13.24  and  for  Case  II  it  is 
8.60.  Using  these  cutoff  points,  the  true  probability  of  misclassifi- 
cation  may  be  calculated  directly  from  the  equations 


f2K/g2 


f  /«» 


,U)d£ 


(89) 


a 


2 


/  0n^(Od£ 
^K/gj 


(90) 


■  .<>.■  .v  /v 


^'AA-VV. 


1  *  *  ' 


These  values  were  found  to  be  as  follows: 


Case  I:  =  0.096 

a2  =  0.054 

Case  II:  “j  =  0*211 

a  =  0.136 

2 

The  values  of  and  estimated  by  Bartlett  and  Please  from 

data  were  a,  =  0.13,  a  =  0.00  for  Case  I  and  a.  =  0.47,  a  = 
1  2  12 

Case  II. 

The  four  discriminant  functions. 

Equal  Probability  Misclassification 

Case  I:  0.87386Z1  -  0.05730Z2  =  12.38 
Case  II:  0.74212Z1  -  0.05504Z2  =  8.03 

Bartlett  and  Please  Method 


Case  I:  0.87386Z1  -  0.05730Z2  =  13.24 
Case  II:  0.742122.^  -  0.05504Z2  =  8.60 


their 
0.27  for 


may  be  expressed  in  the  form  used  in  the  Bartlett  and  Please  article 
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Equal  Probability  Hisclassification 

Case  I:  Z±  -  0.06557Z2  =  14.17 

Case  II:  Z±  -  0.07417Z2  =  10.82 

Bartlett  and  Please  Method 

Case  I:  Z±  -  0.06557Z2  =  15.15 

#  Case  II:  Z±  -  0.07417Z2  =  11.59 

Using  the  calculated  values  of  K  and  K*,  the  Lachenbruch  method 
may  be  evaluated  by  comparing  the  and  a2  values  obtained  by  this 
method  with  those  estimated  using  equations  (68)  and  (69).  The  resub¬ 
stitution  method  was  evaluated  in  a  similar  fashion  also.  In  addition, 
both  the  Lachenbruch  and  resubstitution  methods  were  used  in 
conjunction  with  cutoff  points  K  and  K*  which  were  estimated  from  the 
data.  The  results  of  all  of  these  experiments  are  summarized  in 
Tables  34  and  35  for  Cases  I  and  II,  respectively.  The  symbols  used  in 
the  tables  are  defined  as  follows: 

3.  :  Calculated  by  Lachenbruch  method 

i  J 

“ . :  Calculated  by  Resubstitution  method 

i 

cL*:  Calculated  by  Lachenbruch  method  with  K  estimated  from 
sample 

*>  j  -t-v,  V  'Fwm 
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TABLE  34 

SUMMARY  OF  SAMPLING  EXPERIMENT  RESULTS 
WITH  BARTLETT  AND  PLEASE  AND 
MODIFIED  BARTLETT  AND 
PLEASE  METHODS 


Case  I  (a2  =  3.760) 
K  Theoretical 


Type 

N. 

l 

°1 

“2 

“l 

“2 

“l 

a2 

Eq.  Pr. 

15 

.077 

.077 

.120 

.087 

.080 

.087 

Eq.  Pr. 

100 

.077 

.077 

.086 

.073 

.083 

.074 

BP 

15 

.054 

.096 

.073 

.100 

.053 

.100 

BP 

100 

.054 

.096 

.064 

.094 

.056 

.094 

K  Estimated 


Type 

N. 

l 

a* 

“2 

a* 

1 

a* 

Eq.  Pr. 

15 

.093 

.060 

.053 

.073 

Eq.  Pr. 

100 

.083 

.072 

.082 

.079 

BP 

15 

.073 

.080 

.047 

.087 

BP 

100 

.062 

.093 

.055 

.094 
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TABLE  35 


SUMMARY  OF  SAMPLING  EXPERIMENT  RESULTS 
WITH  BARTLETT  AND  PLEASE  AND 
MODIFIED  BARTLETT  AND 
PLEASE  METHODS 


Case  II  (02  =  2.362) 
K  Theoretical 


Type 

N. 

l 

°1 

“2 

“l 

“2 

al 

®2 

Eq.  Pr . 

15 

.176 

.176 

.240 

.133 

.194 

.153 

Eq.  Pr. 

100 

.176 

.176 

.172 

.163 

.166 

.164 

BP 

15 

.136 

.211 

.213 

.174 

.133 

.200 

BP 

100 

.136 

.211 

.145 

.197 

.139 

.199 

K  Estimated 

Type 

N. 

l 

51 

5* 

a2 

Eq.  Pr. 

15 

.233 

.153 

.167 

.147 

Eq.  Pr . 

100 

.174 

.157 

.169 

.157 

BP 

15 

.180 

.187 

.127 

.194 

BP 

100 

.122 

.187 

.132 

.187 
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Comparing  ou  and  ou  it  can  be  seen  that  the  resubstitution 
method  gives  better  results  than  the  Lachenbruch  method,  probably 
because  the  estimation  procedure  used  in  the  latter  method  is  poor. 

The  differences  between  the  performance  of  the  various  methods 
may  be  seen  more  clearly  by  comparing  the  values  of 

|a,  -  Co,] I  +  |o2  -  :a2]l  (91) 

where  [a^3  may  be  cu  ,  c£.  ,  ou*,  or  a.*.  The  value  of  this  expression  is 
denoted  by  D,  D,  D*,  or  D*,  depending  on  the  [cu]  used. 

The  values  of  D  are  summarized  in  Tables  36  and  37.  The  super¬ 
iority  of  the  method  of  resubstitution  is  quite  apparent  for  the 
smaller  sample  size,  although  the  two  methods  provide  comparable 
results  for  the  larger  sample  size. 

In  the  case  of  estimated  cutoff  points,  the  procedure  used  was 
not  actually  a  Lachenbruch  method,  since  the  cutoff  points  were  not 
recalculated  for  each  sample  after  a  particular  observation  had  been 
deleted.  This  could  have  been  done,  but  the  amount  of  computation 
would  have  been  increased  considerably.  For  example,  with  a  sample 
size  of  200  the  equation  H(K)  =  0  for  the  equal  probability  case  would 
have  to  be  solved  200  times.  Although  it  is  doubtful  if  this  refine¬ 
ment  would  improve  the  results  for  the  30  sample  size  case,  some 
further  study  will  be  devoted  to  it.  However,  the  value  of  this 
method  as  a  practical  approach  would  be  questionable  even  if  some 
improvement  resulted.  This  Lachenbruch  method  would  be  improved 
substantially  only  by  impro  ing  the  estimation  procedure  for  ap  and  b^. 


1  'jL'if+f 


TABLE  36 


COMPARISON  OF  ERROR  PROBABILITIES  ASSOCIATED 
WITH  BARTLETT  AND  PLEASE  AND  MODIFIED 
BARTLETT  AND  PLEASE  METHODS 


Case  I  (a2  = 


Type 

N. 

1 

D 

Eq.  Pr . 

15 

.053 

Eq.  Pr. 

100 

.013 

BP 

15 

.023 

BP 

100 

.003 

Case  II  (a2 


Type 

N. 

l 

D 

Eq .  Pr . 

15 

.107 

Eq.  Pr . 

100 

.017 

BP 

15 

.114 

BP 

100 

.023 

3.760) 


D 

D* 

D* 

.013 

.033 

.028 

".009 

.011 

.007 

.005 

.035 

.016 

.004 

.Oil 

.003 

362) 

D* 

D 

D* 

.041 

.080 

.038 

.022 

.021 

.016 

.014 

.068 

.026 

.015 

.036 

.028 
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5.4  Comparison  of  the  Modified  Bartlett 
and  Please  Method  with  Kendall's  Method 

The  Modified  Bartlett  and  Please  method  was  applied  to  some  of 
the  data  sets  obtained  in  the  sampling  experiments.  The  modified 
method  was  applied  to  each  of  the  50  sets  of  samples  and  the  results 
averaged,  just  as  has  been  done  in  all  of  the  previous  sampling  experi¬ 
ments  discussed  so  far.  The  results  are  given  in  Table  37.  K,  as 
before,  refers  to  Kendall's  method;  K(PC),  to  Kendall's  method  with 
principal  components,  and  MBP,  to  the  modified  Bartlett  and  Please 
method.  The  method  of  resubstitution  was  used  to  obtain  £(1,1)  and 
P(2,2)  for  the  latter  method. 

T’  a  excellent  performance  of  the  modified  Bartlett  and  Please 
method  is  evident  from  Table  37.  The  principal  components  transfor¬ 
mation  was  applied  in  the  case  pj  =  .9,  =  .1,  and,  although  the 

performance  of  Kendall's  method  was  improved  considerably,  the  modified 
Bartlett  and  Please  method  was  still  superior. 

it  is,  of  course,  unfair  to  compare  Kendall's  method  to  the 
modified  Bartlett  and  Please  method  in  cases  for  which  the  latter 
method  was  designed.  It  would  be  interesting  to  compare  Kendall's 
method  to  the  modified  Bartlett  and  Please  method  when  there  actually 
was  a  small  mean  difference. 


TABLE  37 
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CHAPTER  VI 


COMPARISON  OF  KENDALL'S  METHOD  WITH 
OTHER  NONPARA METRIC  TECHNIQUES 

6.1  Successive  Screening 

Feldman,  Klein,  and  Honigfeld  (28)  have  developed  a  discrimination 
method  which  is  quite  similar  to  Kendall's  order-statistic  method. 
Referring  to  their  hypothetical  example  given  in  Table  38,  their  method 
can  be  explained.  It  is  desired  to  separate  out  Group  B.  For  scores 
of  four  and  above,  the  ratio  of  Group  B  to  Group  A  is  10/1;  for  three 
and  above  it  is  25/10.  The  cutoff  point  for  the  parameter  is  that 
value  for  which  the  ratio  of  Group  B  to  Group  A  is  largest.  Each 
parameter  is  examined  in  like  fashion  and  that  parameter  giving  the 
highest  ratio  is  selected.  All  samples  with  that  parameter  value 
beyond  the  cutoff  point  are  eliminated  from  further  consideration,  and 
the  procedure  is  continued  on  the  remaining  samples  using  the  remaining 
variables.  The  procedure  stops  when  the  maximum  ratio  at  any  stage 
falls  below  a  specified  limit. 

In  Kendall's  method  the  cutoff  point  is  chosen  so  that  samples 
from  only  one  of  the  populaticns  are  beyond  the  cutoff  point  or  else, 
using  the  allowed  probability  of  misclassification,  that  only  a 
specified  number  of  samples  from  the  other  distribution  are  allowed.  In 
Kendall's  method  separation  of  samples  from  each  of  the  two  populations 
is  equally  important. 

e-7 


TABLE  38 


HYPOTHETICAL  DISTRIBUTION  OF  ITEM  RATINGS 
FOR  TWO  DIAGNOSTIC  GROUPS  (FROM 
FELDMAN,  ET  AL  (29)) 


Ordinal  Scale 

1 

2  3 

4 

Total 

Group  A 

30 

20  9 

1 

eo 

Group  B 

30 

25  15 

10 
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It  would  reouire  only  minor  changes  in  Kendall's  procedure  to 
obtain  that  of  Feldman,  Klein,  and  Honigfeld.  What  would  be  the 
advantage  of  modifying  Kendall's  method  in  this  way?  In  the  case  in 
which  the  variables  are  on  an  ordinal  scale  but  only  a  small  number  of 
values  are  possible,  such  as  in  rating  a  personality  trait  or  opinion 
on  a  scale  from  0  through  10  in  increments  of  1,  there  may  be  consid¬ 
erable  overlap  in  samples  from  the  two  populations;  frequently  there 
would  even  he  complete  overlap.  In  situations  like  this,  modifying  the 
Kendall  order-statistic  method  to  incorporate  the  Feldman,  Klein,  and 
Honigfeld  ideas  would  be  advisable.  Since  Kendall's  method  is  more 
generally  applicable  than  that  of  Feldman,  Klein,  and  Honigfeld,  and 
since  only  minor  modifications  of  the  computer  program  based  on 
Kendall's  method  would  be  needed  to  incorporate  this  optional  method, 
it  would  appear  more  advantageous  to  make  this  modification  than  to 
have  two  different  methods  available  in  the  form  of  separate,  but  quite 
similar  computer  programs. 

Feldman,  Klein,  and  Honigfeld  list  eight  advantages  in  the  use  of 
successive  screening,  in  medical  work.  These  eight  advantages  are 


quoted  here  since  the  same  claims  may  be  made  for  Kendall’s  order- 
statistic  method: 


1.  No  restrictions  are  placed  on  the  data  distributions  or  the 
ioint  distributions. 


2.  The  categories  are  polythetic,  i.e.,  members  of  the  same  class 
need  not  have  even  a  single  trait  in  common,  but  must  exhibit  a 
minimum  number  of  alternative  trait  set  members.  To  some  degree 
this  resembles  the  ambiguity  that  occurs  with  keys  formed  by 
the  summation  of  weighted  items,  but  the  successive  screening 
method  stipulates  a  specified  minimum  intensity  for  any  trait  to 
be  of  consequence  in  the  di scrim. nation. 


3. 


4. 


Pathognomic  sigr.c  be  easily  recognize ! ,  bnt,  in  general, 
classif ication  is  made  by  sign  pattern. 

Certain  classes  may  be  'ruled  out'  by  certain  traits. 


5.  Ordinal  traits  may  have  both  extremes  used  for  the  same  effect, 
l.e.,  a  L'-shaped  relationship  of  classification  to  trait  can  be 
utilized. 


6.  The  successive  screening  technique  is  essentially  a  counting 
procedure  and  does  not  utilize  mathematical  procedures  that 
depend  on  interval  or  ratio  scales. 

7.  The  procedure  makes  easily  inspected  prima  facie  sense  and  does 
net  involve  such  obscurities  as  suppressor  variables. 

8.  The  model  systemitizes  the  sequential  screening  approach  of 
clinical  diagnosis  but  avoids  the  problem  of  making  serious 
misclascif ications  through  single  measurement  errors,  by  using 
multiple  alternative  traits  at  each  decision  point. 


6.2  Henrichon  and  Fu  Algorithm 

There  is  another  nonparametric  discriminant  analysis  technique 
which  has  been  well  received  by  specialists  in  pattern  recognition  (29), 
although  it  has  not  been  reported  in  the  statistical  literature.  This 
is  the  Henrichon-Fu  technique  (30,  31)  which  will  now  be  explained. 
Consider  first  the  univariate  case  with  two  populations  II  j  and  Tl 2 -  Let 
x  =  (x.,x_,...,x  )  be  a  set  of  independent  observations  from  n  and 

±  z  n  j 
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2.  ~  —  »yn  )  b<=  a  set  of  independent  observations  from  IT The 

x  and  2.  observations  are  represented  respectively  by  a)  and  b)  in 
Figure  2  below. 


Step  1.  (Figure  2,c)  -  Combine  the  x  and  £  observations  and 
order  the  set  xU^  according  to  increasing  numerical  value.  Partition 
the  set  xU£  so  that  there  are  K  samples  in  each  cell.  In  Figure  2, 

K  =  5. 


Step  2.  (Figure  2,  d  and  e)  -  Let  C.(i  =  1,2)  be  the  cost  of 
misclassifying  an  observation  from  H.(i  =  1,2)  and  let  CQ  be  the  cost 
of  not  classifying  an  observation.  In  Figure  2,  =  6,  =  1. 

Count  the  number  of  x’s  and  y's  in  each  cell  and  assign  to  II ^ ,  n^,  or 
H  (unclassified)  according  to  the  following: 


2 

^ C .  (no.  of  samples  from  IK  )  <  CQ K 


then  assign  to  il.  (i=l,2). 
Otherwise  assign  to  n^. 

Then  combine  adjacent  cells  of  the  same  class. 


Step  3.  Adjust  the  cell  boundaries  by  perturbing  them  a  maximum 
of  K/2  samples  in  either  direction  and  locate  the  boundary  at  the 


t  1  k’  k  *  t 


Step  4.  For  any  remaining  cell  with  less  than  K/2  samples, 
dissolve  the  cell,  placing  the  samples  in  the  cell  into  adjacent  cells 
in  such  a  way  that  there  is  the  smallest  increase  in  misclassification 
(Figure  2,  g). 

Step  5.  Repeat  Step  2.  For  this  final  partition  compute  the 
empirical  classification  statistic,  or  Score: 

2 

Score  =  ^  C.  (No.  of  samples  misclassif  ied  from  IK)  (92) 

i=l 

+  Cq  (No.  of  unclassified  samples) 

The  procedure  is  extended  to  the  multidimensional  case  by  cal¬ 
culating  a  score  for  each  variate  and  selecting  the  variate  with  the 
lowest  score  first.  After  the  space  has  been  partitioned  with  this 
first  variate,  the  resulting  cells  are  further  partitioned  using  the 
variate  with  the  second  lowest  score.  This  procedure  is  continued  until 
there  are  no  new  cells  formed. 

The  Henrichon-Fu  algorithm  is  similar  to  the  Kendall  algorithm. 

In  the  Kendall  method  observations  once  classified  using  a  particular 
variable  are  removed  from  further  consideration  in  examining  any  other 
variables.  In  the  Henrichon-Fu  algorithm  the  observations  are  not 
explicitly  removed  from  further  consideration  once  they  are  classified 
as  n j  or  n2,  but  this  is  indeed  the  effect.  Only  nQ  cells  resulting 
from  the  use  of  a  variable  are  further  partitioned  when  the  next 
variables  are  considered. 
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If  we  consider  these  two  techniques  applied  to  two  unimodal  distri¬ 
butions,  then  they  are  quite  similar.  Consider  the  Kendall  method  with 
no  initial  misclassifications  allowed.  The  Henrichon-Fu  algorithm  with 
and  very  much  larger  than  CQ  would  cause  all  cells  with  at  least 
one  observation  differing  from  the  rest  to  be  labelled  nQ,  producing  the 
same  effect.  For  any  selections  of  allowed  probabilities  of  misclass- 
ification  with  the  Kendall  method  the  same  results  could  be  produced 
with  the  Henrichon-Fu  algorithm  using  suitably  chosen  values  of  Cq,  C^, 

C2  and  K. 

The  superiority  of  the  Henrichon-Fu  algorithm  becomes  evident  when 
multimodal  distributions  are  considered.  Referring  to  Figure  3,  it  can 
be  seen  that  Kendall's  method  would  be  useless  but  the  Henrichon-Fu 
method  would  be  appropriate.  Another  point  in  favor  of  the  Henrichon-Fu 
algorithm  is  that  it  is  readily  usable  with  more  than  two  populations, 
but  the  Kendall  method  in  the  present  form  is  usable  only  with  two 
populations. 


F igure  3 . 


Hypothetical  Case  When  Kendall's  Method 
Would  Not  Perform  As  Well  As  the 
Henrichon-Fu  Method 
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The  Kendall  method  could  still  be  better  in  the  unimodal,  two- 
population  case,  since  a  poor  choice  of  K  with  the  Henrichon-Fu 
algorithm  could  produce  poor  results.  Of  course  the  data  could  be 
examined  prior  to  use  of  the  Henrichon-Fu  algorithm  in  order  to  select 
the  best  value  of  K.  However,  if  there  are  many  variates,  each 
requiring  possibly  a  different  value  of  K,  the  task  could  become 
difficult.  There  is  another  important  reason  why  the  Kendall  method 
would  still  be  preferred.  This  is  the  fact  that  Kendall's  method  is 
easily  used  in  routine  discrimination,  even  without  the  use  of  a 
digital  computer. 

In  summary,  for  discriminating  between  two  unimodal  distri¬ 
butions,  the  Kendall  method  may  still  be  better,  but  for  many  other 
cases  the  Henrichon-Fu  algorithm  promises  to  be  a  superior  method. 
Regardless  of  the  advantages  of  the  Henrichon-Fu  algorithm  in  more 
sophisticated  situations,  the  Kendall  method  still  has  the  distinct 
advantage  of  being  capable  of  being  used  without  recourse  to  the 
digital  computer. 

Even  when  the  Henrichon-Fu  algorithm  is  applicable  it  would  be 
interesting  to  consider  the  use  of  an  initial  Kendall  method  prior  to 

s 

application  of  the  Henrichon-Fu  technique. 

Further  study  of  the  Henrichon-Fu  algorithm  and  its  possible 
modifications  and  a  more  complete  comparison  of  the  two  algorithms  are 
planned  for  future  research. 

/ 

6.3  Nearest  Neighbor  and  Related  Methods 

Fix  and  Hodges  (32)  have  considered  a  nearest  neighbor  method  in 
which  a  point  to  be  allocated  is  assigned  to  the  class  of  the  nearest 
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classified  point  in  the  p-dimensional  space.  Variations  of  this 
method  have  considered  allocating  a  point  to  the  class  of  the  majority 
of  the  nearest  points.  The  latter  method  is  referred  to  as  the 
k-nearest  neighbor  decision  rule. 

Kendall  (9)  criticizes  the  nearest  neighbor  methods  for  generatirp 
impossibly  complicated  classification  regions  when  there  is  consider¬ 
able  mixing  of  the  samples  from  the  different  populations,  and  this  is 
the  situation  of  most  importance. 

Pelto  (33)  has  developed  a  method  which  he  calls  adaptive  non- 
parametric  classification.  This  method  estimates  probability  densities 
by  counting  known  points  observed  within  a  hypersphere  around  the 
point  to  be  classified.  The  radius  of  the  hypersphere  is  fixed  to 
minimize  the  expected  loss  of  the  decision  rule. 


CHAPTER  VII 


CONCLUSIONS 

7.1  Summary  of  Results 

1.  Kendall's  order-statistic  method  is  a  promising  technique  in 
discriminant  analysis.  In  cases  where  the  LDF,  linear  discriminant 
function,  is  appropriate,  Kendall's  method  does  not  give  much  larger 
error  rates  than  are  obtained  by  use  of  the  LDF.  In  cases  of  multi¬ 
variate  random  variables  with  symmetric  distributions  such  as  the 
Cauchy  or  the  uniform  (both  with  independent  random  variables) ,  or 
multivariate  normal  random  variables  with  unequal  variance-covariance 
matrices,  Kendall's  method  gives  lower  error  rates  than  those  obtained 
by  use  of  the  LDF  for  populations  with  small  mean  differences.  It 
gives  comparable  error  rates  to  those  obtained  by  use  of  the  LDF  for 
populations  with  larger  mean  differences.  However,  in  most  of  these 
cases  there  is  a  relatively  large  portion  of  the  index  sample  which 
will  not  be  classified.  The  multivariate  lognormal  (independent 
variables)  has  been  considered  as  a  representative  case  of  a  multi¬ 
variate  random  variable  with  a  distribution  which  is  not  symmetric. 

In  this  case  Kendall's  method  provides  much  lower  error  rates  than  are 
obtained  by  use  of  the  LDF  and  most  of  the  index  sample  is  classified. 
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2.  Use  of  an  allowed  probability  of  misclassif ication  in  the 
initial  sample  can  greatly  increase  the  portion  of  the  index  sample 
classified,  while  raising  the  error  rates  only  slightly. 

3.  Maintaining  the  Mahalanobis  distance  but  reducing  the 
overlap  of  the  component  variates  may  sharply  increase  the  error  rate 
using  Kendall's  method. 

4.  Error  rates  for  linear  discriminant  functions  estimated 
from  samples  are  compared  with  the  theoretical  error  rates  for  a 
number  of  cases  of  multivariate  normality  with  unequal  variance- 
covariance  matrices. 

5.  A  modified  Bartlett  and  Please  method  has  been  developed 
which  provides  equal  probabilities  of  misclassif ication.  This  has 
been  applied  in  a  number  of  cases. 

7 . 2  Future  Work 

Some  interesting  questions  for  future  research  would  be 

1.  Extension  of  Kendall's  method  to  more  than  two  populations. 
If  A,  B,  and  C  denote  three  populations,  one  possible  approach  would 
be  to  find  which  is  most  easily  discriminated,  A  from  B  and  C,  B  from 
A  and  C,  or  C  from  A  and  B.  That  separation  is  then  carried  out.  Then 
discrimination  could  be  tried  between  one  of  the  two  groups  remaining 
and  the  other  group  combined  with  the  residual  group  from  the  first 
separation. 
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2.  Investigation  of  The  effect  of  allowed  probability  of 
misclassification  in  Kendall's  method.  More  work  needs  to  be  done  on 
the  trade-off  between  increasing  the  acceptable  misclassification  level 
and  decreasing  the  portion  of  the  sample  unclassified. 

3.  Investigation  of  the  effect  of  differences  in  sample  sizes 
in  Kendall's  method.  This  could  be  an  important  factor,  particularly 
if  one  sample  size  is  much  larger  than  the  other. 

4.  Investigation  of  the  effect  of  non-normal  multivariate 
populations  on  Kendall's  method.  This  work  is  important  in  order  to 
develop  a  measure  of  the  probabilities  of  misclassification  to  be 
expected. 


5.  Investigation  of  the  effect  of  unequal  mean  components  on 
Kendall's  method,  i.e.,  p_'  =  ( y  1  ,u2  ,4 3  ,4 4,^5)  ,  vk  not  necessarily  equal 
to  4^  (i^j).  In  the  one  example  considered  in  the  study,  maintaining 
the  Mahalanobis  distance  but  changing  the  mean  components  strongly 
affected  the  result. 

6.  Investigation  of  the  effect  of  different  values  of  p  other 
than  5.  With  smaller  values  of  p  discrimination  will  be  reduced,  but 
the  question  is  how  much?  With  larger  values  of  p,  Kendall's  method, 
like  linear  discriminant  analysis,  may  classify  the  initial  sample  only 
too  well,  greatly  overestimating  the  probabilities  of  classification 
and  underestimating  the  probabilities  of  misclassification. 
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7.  Consideration  of  the  effect  of  the  a  priori  probability 
that  a  sample  comes  from  a  particular  population.  In  this  study  equal 
a  priori  probabilities  have  been  assumed. 

8.  Further  investigation  of  the  Henrichon  and  Fu  algorithm 
and  comparison  with  Kendall's  method.  As  mentioned  in  the  study,  this 
may  be  an  improvement  over  Kendall's  method.  Professor  Fu  kindly  has 
supplied  a  copy  of  the  latest  version  of  his  computer  program. 

9.  Calculation  of  the  probabilities  of  misclassification  in 
using  Kendall's  methods  by  means  of  nonparametric  statistics  such  as 
is  done  by  Henrichon  and  Fu  in  their  consideration  of  generalized 
tolerance  limits. 

10.  Study  of  the  adequacy  of  the  representation  of 

a2[(l  -  p)I  +  pEpp] 

for  variance-covariance  matrices  in  general. 

11.  Improvement  of  the  estimation  technique  in  the  modified 
Bartlett  and  Please  method. 
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